I don't disagree, but I'm wondering if there's any evidence of this available.
> After all, LLMs have more natural language training data than JSON training data.
While that is true, data also doesn't usually look like natural language (i.e. a collection of financial records). And when it does (i.e. a collection of chat messages), I wonder if it's more confusing if it's unstructured, even if small.
I expect most frontier models to handle these cases just fine either way, so it may largely depend on context—specifically, how much there is, and where the attention shakes out. Ultimately, a claim one way or the other, for something this context-dependent, would have to be backed up by a lot of testing and would probably conclude that, "in most cases, you should do this"