The best performing isn't markdown tables, it's markdown key/value pairs:
## Record 1
```
id: 1
name: Charlie A0
age: 56
city: New York
department: Operations
salary: 67896
years_experience: 7
project_count: 1
```
Which makes sense to me because the problem with formats like CSV and regular markdown tables is that it is too easy for the model to mistakenly associate a value in a row with the wrong header.Explicit key/value formats like this or YAML or JSON objects make that a lot less likely.
I was surprised that XML (56%), with closing tags, wasn’t as good as YAMl/KV(60%), though line breaks perform the same kind of grouping function.
Then I realized from the table that XML used about 50% more tokens (~75K vs ~50K) for similar accuracy, and for the first time felt a kind of sympathy for the LLM…
Yeah that was my intuition as well. I think the KV-Markdown format gains additional advantage over JSON and YAML in the special syntax for headers helping to break up records.