Can you explain why everyone thinks we should use new tools to deploy agents instead of our existing infra?
eg. I already run Kubernetes
Can you explain why everyone thinks we should use new tools to deploy agents instead of our existing infra?
eg. I already run Kubernetes
I think people pick new tooling not because k8s lacks horsepower, but because running per-user filesystem-backed agents on k8s forces you to build and maintain a surprising amount of glue code. Newer platforms put versioned mounts, local-first dev cycles, secure ephemeral runtimes, and opinionated deployment so teams can focus on agent logic instead of writing Helm charts and CSI gymnastics.
If you repurpose k8s with ephemeral volumes or emptyDir, a sidecar, you'll likely get predictable ops and avoid vendor lock-in. Expect more operator work, fragile debugging across PVCs and sidecars, and the need to invest in local emulation or a Firecracker or gVisor sandbox if you want anything like laptop parity.
I think there are some primitives for agents that need to be built out for better security and being able to reason about them.
Agents run on infra, they have network connectivity, they have ACLs and permissions that let them read+write+execute on resources, they can interact with other agents.
To manage them from both an infra and security perspective, we can use the existing underlying primitives, but it's also useful to build abstractions around them for management, kind of like how microservices encapsulate compute+storage+network together.
I think of agents as basically microservices that can act in non-deterministic ways, and the potential "blast radius" of their actions is very wide. So you need to be able to map what an agent can do, and it's much easier to do that if there are abstractions or automatic groupings instead of doing this all ourselves.
The "non-deterministic microservices" framing is exactly right and I think most infra teams underestimate how much that changes things. With a normal service, you can map inputs to expected outputs and write tests. With agents, the blast radius is probabilistic and context-dependent.
The monitoring problem alone is closer to fraud detection than traditional APM. You're not looking for "is this thing up," you're looking for "is this thing subtly wrong in a way that compounds over the next 10 steps."
I'd argue it's both. You also want to know when your agent has collapsed and is burning tokens and your budget.
Right, those abstractions and controls already exist in the Kubernetes ecosystem. I can use one set of abstractions for everything, as opposed to having something separate for agents. They are not that different, the tooling I have covers it. There are also CRDs and operators to extend for a more DSL like experience.
tl;dr, I don't think the shovel analogy holds up for most of the Ai submissions and products we see here.
[dead]
Unrelated but your comments on https://news.ycombinator.com/item?id=44736176 related to the Terminal agents coding craze have helped me feel less crazy. People using GitHub Copilot CLI and Claude Code, they either never review the code or end up opening up an IDE to review the code, and I'm sitting here like, why don't you use the terminal in your favorite IDE? You're using a Terminal as a chat interface, so why not just use a chat interface? Or use the terminal in VS Code which actually now integrates very well with Claude Code and GitHub Copilot CLI so you can see what's going on across the many files this thing is editing?
The hype is so large with the CLI coding tools I got FOMO, but as you were saying in that thread, I see no tangible improvement to the value I get out of AI coding tools by using the CLI alone. I use the CLI in VS Code, and I use the chat panel, and the only thing that seems to actually make a difference is the "context engineering" stuff of custom instructions, agent skills, prompt files, hooks, custom agents, all that stuff, which works no matter which interface you use to kick off your AI coding instructions.
Would be curious to hear your thoughts on the topic all these months later.
Glad to find comradery! I've started the CLI interface to my custom agent since lol
The reasons are (1) it's faster to do admin work like naming or deleting old sessions (2) I have not gotten the remote setup to work yet (haven't tried) but I do want to use it somewhere
But yeah, it's gotten worse, the latest I recall is a new diff viewer for AI in the terminal (I already have git and lazygit)
It's hilarious to me how we are recreating decades of IDE advancements such that they work on the terminal, only for us to end up with what is essentially an IDE.
I was doing that with (neo)vim and reached the point that I wanted to stop having to maintain a sorta-IDE. I'm now doing the same with agents (custom vscode extension), but I find this different for a number of reasons, primarily that I don't want Big Ai deciding how I can interact with and use Ai.
One thing I took from ATProto is a strong belief that user agency and choice are the penultimate design criteria. To those ends, I think that any agentic tooling needs to support the majority of users' choice about how to interact with it (SDK, API, CLI, TUI, IDE, and Web). My custom agent is headed that way anyhow, because there are times where I do want to reach for one of them, and it's easier to make it so with agents working on their own codebase (minus vscode because the testing/feedback I haven't figured out yet)
Honest answer: the problems start when you're running 50+ agents across 3 different model providers and the failure modes aren't "pod crashed" anymore. They're "model returned confidently wrong output and the next 4 steps ran on garbage."
K8s is great at keeping things alive. It's not built to reason about whether the thing that's alive is actually working correctly. Agent infra needs to handle rollback at the logic level, not just the container level.
Yup! And this is a genuinely hard problem when you try to apply agents to domains other than coding. With coding, you can easily rollback. But in other domains, you take action in the real world and that's not easy to rollback.
We're thinking a lot about how we could provide a "Convex" like experience where we guide your coding agents to set up your agents in a way that maximizes the ability to rollback. For example, instead of continuously taking action, it's better that agents gather all required context, do the work needed to make a decision (research, synthesize, etc.), and then only take action in the real world at the end. If an agent did bad work, then this makes it easy to rollback to the point where the agent gathered all the context, correct it's instructions, and try again
Given what OP describes
> Our biggest pain point with hosting agents was that you'd need to stitch together multiple pieces: packaging your agent, running it in a sandbox, streaming messages back to users, persisting state across turns, and managing getting files to and from the agent workspace.
The k8s ecosystem already handles most this and your agent framework the agent specifics. What you are talking about is valid, though a different axis imo. Quality and guardrails are important, but not discussed by OP.
I think Kubernetes is a good candidate to run these sandboxes. It is just that you have to do a lot of annotations, node group management, pod security policies, etc., to name a few. Apply the principle of least privilege for access to mitigate risk.
I think Kata containers with Kubernetes is an even better sandboxing option for these agents to run remotely.
Shameless plugin here but we at Adaptive [1] do something similar.
[1] https://adaptive.live
We already do those things with k8s, so it's not an issue
The permissions issues you mention are handled by SA/WIF and the ADK framework.
Same question to OP, why do you think I need a special tool for this?
We have also built something custom ourselves (with modal.com serverless containers), running thousands of on-demand coding agents each day and already the assumptions that Terminal Use is making (about using the file system and coding agent support) would not work for our use case.
Curious to hear why we wouldn't work! I'd love to understand what assumptions we're making that won't work for your use case, and what we could work to improve on
It seems like so many of the AI "solutions" are hallucinating the problems. I either don't have them, because I use better AI frameworks, or I have tools at hand that solve them nicely.
We don't need to rebuild everything just for agents, except that people think they can make money by doing so. YC has disappointed me of late with the lack of diversity in their companies. I suspect the change in leadership is central to this.
At least on K8s you can control the network policy. That's the harder problem to solve. I suspect we'll see a lot of exfiltration via prompt injection in the next few years.