https://research.nvidia.com/labs/lpr/slm-agents/ - Distillation data is a natural byproduct of using these models. There's no effective defence against it. Anthropic is degrading thinking blocks to summaries to slow it down and hide model internals, but in the end, the math says you're SOL and it works on MNC/Large Corporate scale well enough that the moment cost becomes a priority, you're left without the lock in you need to keep customers paying.
Byproduct? It’s essentially the only part of an LLM that is useful, because it’s the WHOLE product!
It’s the same reason why DRM for audio and video is a non sequitur - if you want a person to see or hear audio or video, eventually at the end of the chain, it’s going to be converted to audio for the ear and light for the eyes - that’s why you attach your tap.
Without a model generating tokens, what’s the point. So if Anthropic somehow disable quality token generation, what’s the point!
That's why the harness is moving server-side: because generating tokens is not the actual point of the model, not for the users. Especially with tool calling giving us agents that can act, most of the tokens generated are not, themselves, critical to the end users. Specifically, a lot of tokens goes into orchestrating actual tool calls, and then most "thinking tokens" are only relevant to users only in so far as they help users keep track of and verify what the LLM is doing. So all those tokens can be hidden or replaced by partial summaries, and all of that can happen server-side, and then there's very little to distill from.
I haven't heard of this happening, do you have links any explainers on this?
Claude on the Web (which includes also at least the Android and Desktop apps) and ChatGPT web app are two examples - they keep gaining agentic capabilities.
Perhaps most striking example for me - I've been using a lot of Claude Code in the past month, most of it was through the web, Desktop (app) or phone interface, running actual harness "remotely" (somewhere on Anthropic-controlled infra).
One way of looking at it: web surfaces are slowly catching up with (fraction of the power of) agentic coding tools. But another way is, the major players are building up SaaS harnesses that start to compete with (their own) local ones. The reason may be ease of use, but the practical side effect is making it much harder to use their models to train competition, as these SaaS harnesses create an abstraction layer on top of LLMs that resides entirely in the vendor's cloud and therefore cannot be worked around.