Unfortunately, operating a byte at a time means there's a hard limit on performance.

A truly performant lexer needs to jump ahead as far as possible. This likely involves SIMD (or SWAR) since unfortunately the C library fails to provide most of the important interfaces.

As an example that the C library can handle tolerably, while lexing a string, you should repeatedly call `strcspn(input, "\"\\\n")` to skip over chunks of ordinary characters, then only special-case the quote, backslash, newline and (implicit!) NUL after each jump. Be sure to correctly distinguish between an embedded NUL and the one you probably append to represent EOF (or, if streaming [which requires quite a bit more logic], end of current chunk).

Unfortunately, there's a decent chance your implementation of `strcspn` doesn't optimize for the possibility of small sets, and instead constructs a full 256-bit bitset. And even if it does, this strategy won't work for larger sets such as "all characters in an identifier" (you'd actually use `strspn` since this is positive), for which you'll want to take advantage of the characters being adjacent.

Edit: yikes, is this using a hash without checking for collisions?!?

You can go pretty far processing one byte at a time in hardware. You just keep making the pipeline deeper and pushing the frequency. And then to combat dependent parsing you add speculative execution to avoid bubbles.

Eventually you land on recreating the modern cpu.

You can get pretty far with a branch per byte, as long as the bulk of the work is done w/ SIMD (like character classification). But yeah, LUT lookup per byte is not recommended.

You are somewhat right, I used tagging masks to differntiate between different types of atoms [1]. But yes, interning will be backed by a correct implementation of a hashmap with some collision handling in the future.

[1]: https://github.com/xNaCly/purple-garden/blob/master/cc.c#L76...