[flagged]

If you think statefull LLMs would be easier to handle then stateless... Then I think you haven't done a lot of software engineering

Maybe a charitable reading of the parent comment, but my interpretation of it was that while the _models_ are stateless, modern deployments of these models for inference rely on state.

For example, tiered pricing for cached context relies on state, even if the models don’t.

For that matter, the agent harness accumulating "chat history" is state.

Of course you can pass in your own state, but I always wondered about an LLM that has conversation context stay resident in GPU memory somehow.

Or maybe this already effectively covered by context caching and the gains would be minimal (stateless, but if you pass in the same context or the same head context, it’s already in GPU memory and doesn’t need to be loaded?).

This. lol. If you think state makes things easier you're in for a big surprise.

That does not seem to be related to llms? It is more about the harness that utilizes them, right?

The parent comment is AI generated drivel, that’s why. Incredible that it has generated such a lively discussion.

It costs tokens, so it helps the business model, so it’s not a bug but a feature.