This is a great example of the kind of question I'd love to be able to ask these documentation bots but that I don't trust them to be able to get right (yet):

> What’s the syntax, in Liquid, to detect whether an order in an email notification contains items that will be fulfilled through Shopify Collective?

I suspect the best possible implementation of a documentation bot with respect to questions like this one would be an "agent" style bot that has the ability to spin up its own environment and actually test the code it's offering in the answer before confidently stating that it works.

That's really hard to do - Robin in this case could only test the result by placing and then refunding an order! - but the effort involved in providing a simulated environment for the bot to try things out in might make the difference in terms of producing more reliable results.

get a second agent to validate the return from the first agent. but it might get it wrong because reasons, so you need a third agent just to make sure. and then a fourth. and so on. this is obviously not a working direction.

That's why you give them the ability to actually execute the code in a sandbox. Then it's not AI checking AI, you're mixing something deterministic into the loop.

That may certainly increase the agent's ability to get it right, but there will always be cases where the code it generates mimics the correct response, i.e. produces the output asked for, without actually working as intended, as LLMs tend to want to please as much as be correct.

However I think it would remove the case of the bit outright making up non-existent stuff. It could still always be just plain wrong, but in a more human sort of way. A real support person may be wrong about some precise detail of what they’re recommending, but unlikely to just make up something plausible.

Not much harm done. The end user sees the response and either spots that it's broken or finds out it's broken when they try to run it.

They take a screenshot and make fun of the rubbish bot on social media.

If that happens rarely it's still a worthwhile improvement over today. If it happens frequently then the documentation bot is junk and should be retired.

youre hand wavibng all the other million use cases where returning false information isnt OK.

the return may still not reflect the sandbox reality.