You're exactly right. The llguidance library [1,2] seems to have emerged as the go-to solution for this by virtue of being >10X faster than its competition. It's work from some past colleagues of mine at Microsoft Research based on theory of (regex) derivatives, which we perviously used to ship a novel kind of regex engine for .NET. It's cool work and AFAIK should ensure full adherence to a JSON grammar.
llguidance is used in vLLM, SGLang, internally at OpenAI and elsewhere. At the same time, I also see a non-trivial JSON error rate from Gemini models in large scale synthetic generations, so perhaps Google hasn't seen the "llight" yet and are using something less principled.
1: https://guidance-ai.github.io/llguidance/llg-go-brrr 2: https://github.com/guidance-ai/llguidance
Cool stuff! I don't get how all the open source inference framework have this down but the big labs doesn't...
Gemini [0] is falsely advertising this:
> This capability guarantees predictable and parsable results, ensures format and type-safety, enables the programmatic detection of refusals, and simplifies prompting.
[0]: https://ai.google.dev/gemini-api/docs/structured-output?exam...