> but you have no idea what actually goes on behind the curtains, which quantization levels they use and so on.
That would take something close to a global conspiracy of every technologist lying continuously to keep the tweaks secret. If necessary, I personally will rent some servers and run a vanilla Kimi K2.6 deployment for people to use at reasonable prices. I don't expect to ever make good on that threat because they are grim times indeed if I'm the first person doing something AI related, but the skill level required to load up a model behind an API is low.
So it isn't hard to see how there will be unadulterated Kimi models available and from there it is really, really straightfoward to tell if someone is quantising a model; just run some benchmarks against 2 different providers who both claim to serve the same thing. If one is quantising and another isn't there's a big difference in quality.
> If one is quantising and another isn't there's a big difference in quality.
Sure. But the problem is you have to do this continuously to have any measure of confidence, which is expensive. For example, a provider could at any point randomly start serving some fraction of the requests to a quantized model. Either due to "routing error", as Anthropic called one of their model degradation events, or trying to improve bottom line.
There's really no good way to detect this on a few-prompt level without overspending significantly, because they're all black boxes.