The jump table is interesting, although I guess the performance of switch will be similar if properly optimized with the compiler, but would not be able to tell without trying. Also different compilers might take different approaches.
A few months ago I built a toy boolean expression parser as a weekend project. The main goal was simple: evaluate an expression and return true or false. It supported basic types like int, float, string, arrays, variables, and even custom operators.
The syntax and grammar were intentionally kept simple. I wanted the whole implementation to be self-contained and compact, something that could live in just a .h and .cc file. Single pass for lexing, parsing, and evaluation.
After having the first version working, I kind of challenged myself to make it faster and tried many things.
Once the first version was functional, I challenged myself to optimize it for speed. Here are some of the performance-related tricks I remember using:
- No string allocations: used the input *str directly, relying on pointer manipulation instead of allocating memory for substrings.
- Stateful parsing: maintained a parsing state structure passed by reference to avoid unnecessary copies or allocations.
- Minimized allocations: tried to avoid heap allocations wherever possible. Some were unavoidable during evaluation, but I kept them to a minimum.
- Branch prediction-friendly design: used lookup tables to assist with token identification (mapping the first character to token type and validating identifier characters).
- Inline literal parsing: converted integer and float literals to their native values directly during lexing instead of deferring conversion to a later phase.
I think all the tricks are mentioned in the article already.For what is worth, here is the project:
https://github.com/pausan/tinyrulechecker
I used this expression to assess the performance on an Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz (launched Q3 2018): myfloat.eq(1.9999999) || myint.eq(32)
I know it is a simple expression and likely a larger expression would perform worse due to variables lookups, ... I could get a speed of 287MB/s or 142ns per evaluation (7M evaluations per second). I was gladly surprised to reach those speeds given that 1 evaluation is a full cycle of lexing, parsing and evaluating the expression itself.The next step I thought was also to use SIMD for tokenizing, but not sure it would have helped a lot on the overall expression evaluation times, I seem to recall most of the time was spent on the parser or evaluation phases anyway, not the lexer.
It was a fun project.