How much of an improvement does SIMD offer for something like this? It looks like it's only being used for strings and comments, but I would kind of assume that for most programming languages, the proportion of code that is long strings / comments is not large. Also curious if there's any performance penalty for trying to do SIMD if most of the comments and strings are short.
Usually lexing isn't part of the performance equation compared to all other parts of the compiler, but SIMD can be used to speedup the number parsing.