Curious if anyone has seen differences in how models handle conflicting tool descriptions — e.g., two tools with overlapping capabilities where the boundary isn't clear. In my experience that's where most bad tool calls come from, not from missing descriptions but from ambiguous overlap between tools.

That's actually interesting, thanks!

I wrote this post because of exactly those corner cases. If I'm building something agents would use - how do i understand which tool they'd actually choose?

For example you building an API provider for image generation. There are thousands of them in the internet.

I wonder if there are a tool that basically would simulate choosing between your product/service and your competitors one.