I think this is great and I was thinking about doing the same thing, you could use a few different LLMs to do a kind of adversarial check of calories for a thing and use a load of other techniques to improve the dataset over time. It's hardly as if the dataset in something like MyFitnessPal is completely perfect. You could probably figure out boundaries too so if we know 100g of a food does the protein/fat/carbs/sugar/fibre/water etc. content wildly exceed the boundaries/totals you could get it to review it's findings, ignore that LLM, fill in from a different food db etc. etc.
Incidentally o3-mini-high got the fried breakfast I added to a tracking app this morning within 50 calories!