I like this idea. This might be one of the more effective social pressures available for getting inference providers to fix long-standing issues. AWS Bedrock, for example, has crippling defects in its serving stack for Kimi’s K2 and K2.5 models that cause 20%-30% of attempts to emit tool calls to instead silently end the conversation (with no token output). That makes AWS effectively irrelevant as a serious inference provider for Kimi, and conveniently pushes users onto Bedrock’s significantly more expensive Anthropic models for comparable performance on agentic tasks.

It's old, Kimi's been doing this for months now.

https://github.com/MoonshotAI/K2-Vendor-Verifier

https://github.com/MoonshotAI/Kimi-Vendor-Verifier

Note, this is before K2.5 and K2.6 even launched.