I may be showing my ignorance here, but wouldn't the ideal situation be for the service to use the same number of tokens no matter what client sent the query?

If the service is using more tokens to produce the same output from the same query, but over a different protocol, than the service is a scam.

When you're using an agent, the "query" isn't just each bit of text you enter into the agent prompt. It's the whole conversation.

But I do wonder about these tools whether they have tested that the quality of subsequent responses is the same.

That doesn't explain why the protocol matters. Surely for equivalent responses, you need to send equivalent payloads. You shouldn't be able to hack this from the client side.