At some point the value of remote inference becomes more expensive than just buying the hardware locally, even for server-grade components. A GB200 is ~$60-70k and will run for multiple years. If inference costs continue to scale, at some point it just makes more sense to run even the largest models locally.
OSS models are only ~1 year behind SOTA proprietary, and we're already approaching a point where models are "good enough" for most usage. Where we're seeing advancements is more in tool calling, agentic frameworks, and thinking loops, all of which are independent of the base model. It's very likely that local, continuous thinking on an OSS model is the future.
Maybe 60-70k nominally but where can you get one that isnt in its entire rack configuration
Fair, but even if you budget an additional $30k for a self-contained small-unit order, you've brought yourself to the equivalent proposed spend of 1 year of inference.
At $100k/yr/eng inference spend, your options widen greatly is my point.