50x isn't reasonable, it's a cache disaster. Any perf win from avoiding JIT gets eaten alive.

Only if it is all actually used at runtime; and presumably the vast majority of possible decoding starting points won't be.

I wouldn't jump to conclusions. Instructions aren't so big anyways and they are optimized JIT by CPU.