I don't know about this specifically, but I've seen a lot of big data jobs where 99% of the CPU was spent in JSON ser/deser. This might be a reasonable chunk of it.

Though when we talk about JSON (5 bytes - 40 bits) the processing throughput is 20Gbps (this algorithm) vs 5Gbps (previous implementation, 4x slower).

I doubt CPU cycles are problem in that case.

I agree - I was more talking about naive JSON ser/deser that Lemire was not involved in.

JSON ser deser is usually dominated by floats rather than ints, and they are more expensive to handle.