I love everything about this direction except for the insane inference costs. I don’t mind the training costs, since models are commoditized as soon as they’re released. Although I do worry that if inference costs drop, the companies training the models will have no incentive to publish their weights because inference revenue is where they recuperate the training cost.
Either way… we badly need more innovation in inference price per performance, on both the software and hardware side. It would be great if software innovation unlocked inference on commodity hardware. That’s unlikely to happen, but today’s bleeding edge hardware is tomorrow’s commodity hardware so maybe it will happen in some sense.
If Taalas can pull off burning models into hardware with a two month lead time, that will be huge progress, but still wasteful because then we’ve just shifted the problem to a hardware bottleneck. I expect we’ll see something akin to gameboy cartridges that are cheap to produce and can plug into base models to augment specialization.
But I also wonder if anyone is pursuing some more insanely radical ideas, like reverting back to analog computing and leveraging voltage differentials in clever ways. It’s too big brain for me, but intuitively it feels like wasting entropy to reduce a voltage spike to 0 or 1.
> I love everything about this direction except for the insane inference costs.
If this direction holds true, ROI cost is cheaper.
Instead of employing 4 people (Customer Support, PM, Eng, Marketing), you will have 3-5 agents and the whole ticket flow might cost you ~20$
But I hope we won't go this far, because when things fail every customer will be impacted, because there will be no one who understands the system to fix it
Inference costs at least seem like the thing that is easiest to bring down, and there's plenty of demand to drive innovation. There's a lot less uncertainty here than with architectural/capability scaling. To your point, tomorrow's commodity hardware will solve this for the demands of today at some point in the future (though we'll probably have even more inference demand then).
I worry about the costs from an energy and environmental impact perspective. I love that AI tools make me more productive, but I don't like the side effects.
Environmental impact of ai is greatly overstated. Average person will make bigger positive impact on environment by reducing his meat intake by 25% compared with combined giving up flying and AI use.
[dead]
[dead]
This is the wrong way to see it. If a technology gets cheaper, people will use more and more and more of it. If inference costs drop, you can throw way more reasoning tokens and a combination of many many agents to increase accuracy or creativity and such.
> throw way more reasoning tokens and a combination of many many agents to increase accuracy or creativity and such.
But this is just not true, otherwise companies that can already afford such high prices would have already outpaced their competitors.
No company at the moment has enough money operate with 10x the reasoning tokens of their competitors because they're bottlenecked by GPU capacity (or other physical constraints). Maybe in lab experiments but not for generally available products.
And I sense you would have to throw orders of magnitude more tokens to get meaningfully better results (If anyone has access to experiments with GPT 5 class models geared up to use marginally more tokens with good results please call me out though).
I mean theoretically if there are many competitiors the costs of the product should generally drop because competition.
Sadly enough I have not seen this happening in a long time.