> LLMs return malformed JSON more often than you'd expect, especially with nested arrays and complex schemas. One bad bracket and your pipeline crashes.

This might be one reason why Claude Code uses XML for tool calling: repeating the tag name in the closing bracket helps it keep track of where it is during inference, so it is less error prone.

Yeah that's a good observation. XML's closing tags give the model structural anchors during generation — it knows where it is in the nesting. JSON doesn't have that, so the deeper the nesting the more likely the model loses track of brackets.

We see this especially with arrays of objects where each object has optional nested fields. For complex nested objects, the model can get all items well formatted but one with an invalid field of wrong type. That's why we put effort into the repair/recovery/sanitization layer — validate field-by-field and keep what's valid rather than throwing everything out.

Unless I'm totally misunderstanding something it's not xml but special tokens for the tokenizer someone smarter than me might know https://medium.com/@nisarg.nargund/why-special-tokens-matter...

Not in Claude Code, where asking it to print the XML used for tool calling makes it accidentally trigger the tool call

Hardly matters, this isn't a problem that you'd have these days with modern LLMs.

Also, a model can always use a proxy to turn your tool calls into XML

And feed you back json right away and you wouldn't even know if any transformation did take place.

We do see fewer invalid JSONs on latest bigger LLMs but still can happen on smaller and cheaper models. There is also case when input is truncated or a required field not found, which are inherently difficult.

On XML vs JSON, I think the goal here is to generate typed output where JSON with zod shines - for example the result can type check and be inserted to database typed columns later

Thing is even with XML LLM will fail every now and then.

I've built an agent in both tool calling and by parsing XML

You always need a self correcting loop built in, if you are editing a file with LLM you need provide hints so LLM gets it right the second time or 3rd or n time.

Just by switching to XML you'll not get that.

I used to use XML now i only use it for examples in in system prompt for model to learn. That's all

Agreed - in this project I did a one path sanitation to recover invalid optional / nullable fields or discard invalid objects in nested array.

I know multi path LLM approaches exist: e.g. generating JSON patches

https://github.com/hinthornw/trustcall

[dead]