I've found that xml is surprisingly good for llms when it comes to table extraction in production. I only found out when I send the raw xml storage format to benchmark again various flavours of everything else. XML turns out to the best format for tables that have more than three levels of nesting.