on prem economics dont work because you can't batch requests. unless you are able to run 100 agents at the same time all the time
on prem economics dont work because you can't batch requests. unless you are able to run 100 agents at the same time all the time
> unless you are able to run 100 agents at the same time all the time
Except that newer "agent swarm" workflows do exactly that. Besides, batching requests generally comes with a sizeable increase in memory footprint, and memory is often the main bottleneck especially with the larger contexts that are typical of agent workflows. If you have plenty of agentic tasks that are not especially latency-critical and don't need the absolutely best model, it makes plenty of sense to schedule these for running locally.