Intriguing, but I wonder if they've looked at whole-conversation token usage, though, and not just short tasks.
I just saw a [paper](https://arxiv.org/pdf/2602.05447) that investigated similar aspects of TOON (which aims to reduce JSON tokens), and they found that even though TOON itself reduced the number of tokens, LLMs were less familiar with it, and thus spent even more tokens trying to decipher it, or making mistakes (see section 4.5, figures 6 and 7).
From the paper: >Unlike Markdown, where each grep hit simply returned more text, TOON's overhead was driven by a combination of output density and additional tool calls from pattern unfamiliarity
---- There's a strangeness tax with LLMs, and it can be substantial.
I would not be surprised at all if this technique turned out to be only a local minimum, with detrimental global effects.