Though when we talk about JSON (5 bytes - 40 bits) the processing throughput is 20Gbps (this algorithm) vs 5Gbps (previous implementation, 4x slower).

I doubt CPU cycles are problem in that case.

I agree - I was more talking about naive JSON ser/deser that Lemire was not involved in.