I have applied a subset of these techniques in a tokenizer in C++ to parse a language syntactically similar to Swift: no inline assembly, no intrinsics, no SWAR but reduce branching, cache optimization and SIMD parsing + explicit vectorization.
I get:
- ~4 MLOC/sec/core on a laptop
- ~ 8-9MLOC/sec/core on a modern AMD sever grade CPU with AVX512.
So yes, it is definitively possible.