I’m having vague flashbacks here so I might be guessing wrong, but: the only reason I’ve ever seen an “inlining failed” error from GCC when using intrinsics (and it was an actual error) was when it actually meant “you can’t use this intrinsic in this configuration” (the second half of the message, “target specific option mismatch”, is more helpful if still cryptic). Thus the fix was to change the argument of the -march= option or (for dynamic dispatch) decorate the caller with the correct __attribute__((target)). E.g. if you pass -march=x86-64-v1 but try to use AVX you’ll get such an inlining error. (This is unlike MSVC which will always allow you to use any intrinsic supported by the compiler.)

I think you are right in the interpretation. In my case though dynamic dispatch was already integrated with bytecode lowering so the error wasn't helpful :(

Ah, so you want the function containing the interpreter loop to be compiled for the baseline architecture but some of the bytecode implementations inside it to use more advanced intrinsics? Yeah, I don’t think GCC has a good answer to that one. It also sounds gnarly from a general calling-convention perspective—how is the VZEROUPPER on exit supposed to be emitted if you can’t count on AVX?..

We recently (finally) got __attribute__((musttail)) in GCC[1], I’ve just tried it between functions with mismatched __attribute__((target))s and it does work, so theoretically you could code your interpreter that way. But it seems like you’re still bound to keep loading and storing vector state from and to memory and VZEROUPPERing your registers after each bytecode, and that doesn’t sound like a particularly good time.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119616