Seeing half of an AR LLM's output tokens go to generating a predefined json schema bothers me so much. I would love to have an option to use diffusion for infilling.

One trick I learned for this was to use csv for LLM I/I and translate json <-> csv at the boundary layer

Oh neat. So have the llm output csv instead of JSON and then convert it? How would handle nested structures?

Depending on how it's nested, you could denormalize, think of how you could denormalize a one-to-many SQL relationship

So if you have a user that has many automobiles, maybe instead of Autos: [...] you could do Auto1Make Auto2Make etc.