If they convinced me of their helpfulness, and their output is actually helpful in solving my problems.. well, if it walks like a duck and quacks like a duck, and all that.

if it walks like a duck and it quacks like a duck, then it lacks strong typing.

"Appears helpful" and "is helpful" are two very different properties, as it turns out.

Sometimes, but that's an edge case that doesn't seem to impact the productivity boosts from LLMs

It doesn't until it does. Productivity isn't the only or even the most important metric, at least in software dev.

Can you be more specific with like examples or something?