My bet is that the prices will crash once OpenAI (and/or Antrophic) IPO's have happened.

Right now the biggest threat to their IPO's is that people realize that local models are good enough for whatever they're peddling, what's the most important factor to even running good enough models? RAM since you want the models in memory to not be total slogs.

But remember that markets can stay irrational longer than anyone can hold his breath. If they get more funding there's a good chance they'll invest more in the destruction of the remaining production capacity. Admitting that with normal pricing anyone could have a decent AI-machine for 2K is hard - prices for acceptable AI-machines most likely will go >10K first.

You are saying there will be even more demand for RAM and that will cause the prices to crash?

The chance is that they cache out during IPO, and will lose interest to increase capacity, some/many contracts will be canceled, and demand being reduced.

Perhaps all hobbyst developers interested in local AI combined is a smaller demand than AI companies hoarding parts. That would make demand decrease.

If local models are good enough, doesn't that increase demand for DRAM as everyone buys DRAM for their poorly utilized local machines?

Surely it is a more efficient use of DRAM to run inference on shared hardware with large batch sizes and more utilization.

Luckily very few people can configure and are interested in local models. But your nearby datacenter running Chinese open-weight models is also good enough.

My point is that dram demand is mostly orthogonal to whether everyone is using open weight models or secret weight models. Heavy demand for local models (whether secret or open weight) will require even more aggregate DRAM than for shared.

Demand will only go down if people reduce their use of these AI tools. Given how much folks here complain about quotas, I'm very skeptical that will happen willingly.

Open weight models allow for repurposing existing hardware locally, and there's a lot of it around - far more than the amount of new RAM being supplied. So they add some short-term downward pressure to the price. (But not very much, since these datacenter builds are long-term investments that are targeted at eventually running far larger models.)

If regular people can repurpose old hardware, so can shared providers, who can extract more value from the hardware and thus afford to pay more.

In a constrained market, supply and demand favors folks who can most efficiently extract rent. Local models only make sense in a world with abundant compute and energy.

This...

Right now the biggest threat to their IPO's is that people realize that local models are good enough for whatever they're peddling...

...plus the recent price increases by AI companies, made me actually think the opposite: that there might be another additional "run" for memory and/or GPUs.

Therefore, yesterday I decided to order an additional RTX 5060 with 16 GiB VRAM for the ~500$ that I saved during the last months (to be added to the RTX 5070 12 GiB that I bought last year to play games in 4k + my old RTX 3060 12 GiB which I recycled a few months ago after noticing how nice it is to run llama.cpp locally without having to worry about subscription costs).

The original 24 GiB VRAM were actually quite enough for some of the stuff that I do (e.g. transcribe text of image scans of old magazines, coding with Aider, etc - I usually use Q5_K_M quantizations of Qwen & Gemma by Bartowski as lower ones delivered sometimes weird results and/or looped forever in "thinking"-mode), but I guess that with 40 GiB I should be bullet-proof for my pessimistic view of our future :o)

My bet is that we're not gonna see any adjustments in RAM pricing until one of the planned data center projects collapses in a spectacular way.

One theory: they will need to throw away all these Nvidia cards in the trash at some point right ?

Because what to do with power-consuming outdated hardware ? let's say 5 years from now ?

They will need new RAM.

I wonder.

I’d gladly take a few of these self-contained rack-clusters off their hands when they do.

I’d even get a house with a garage or something just for that.

The billionaires locked in a race to spend effectively unlimited funds on AI CapEx will have to be convinced by markets and/or their advisers that there aren't enough profits and that cutting losses (like with Metaverse) in their quixotic quest is necessary.

And honestly, we will have much bigger problems if that bubble pops in a spectacular fashion.

Which problems would that be? Nasdaq crashing by a few percent and a major player to go under. Seems almost inevitable at some point.

And taking everyone's 401ks down with them because of the idiotic rules changes to accommodate SpaceX? Its going to be the largest transfer (theft) of wealth since 2008. AI is the only thing driving any kind of growth in the market at all now. That pops, its going to bring the entire economy down with it.

The vast majority of index funds are float adjusted. SpaceX will not have that many shares available relative to the total value of the company. It's a negligible percent of the overall market cap of say, VTI. This commonly repeated trope is misinformation. The change in rules is more of a problem than the actual value.

> that local models are good enough for whatever they're peddling

they are not. Unless you are satisfied with plausible, but mostly garbage output.

They are actually quite a bit better than you might think. Qwen3.6 27B is pretty capable at coding.

For non-coding work, they are more than good enough. A lot of the ways my non-technical family members have interacted with AI would be perfectly served by using a local model.

After all, people were more than satisfied with the results from GPT 3. That has long since been surpassed by open weight models.

I'm sure there are things local models are good enough at in non-coding work, but for anything complex I do not find this to be the case.

I'd say local models are fairly capable of even somewhat complex coding execution. For complex non-coding work (research, in-depth analysis, assembly of complex info-dense documents) I'd rather do it by hand than switch from Opus 4.7 to anything I could even theoretically run locally.

I don't know what kind of coding, but for my case it's been useless. Not working code almost every time. It's much quicker to just write it by hand than use that model.

I've been experimenting with Qwen3 Coder Next and Copilot for a little Rust toy project and it's been trucking along. It does require a fair bit of hand-holding (or perhaps I just don't trust it to give it larger tasks), but it works alright.

Give 3.5 or 3.6 a shot. 3.5 has smaller models that you could leverage. If you can swing it, though, 3.6 27B is quite good.

I get results comparable to the saas. Maybe Anthropic sold you too much crack tokens.

I suspect that a lot of people are ignoring the shift to a model where you feed entire specifications to a group of models in differing configurations controlling different aspects of the development task. In that model it doesn't really matter how fast the tokens are generated, only that they are eventually generated and that the assembly is good enough. It's a specification compiler at that point.

Multi-agent bread and butter.

Honestly, that's the output I get from non-local models, anyway. If I'm going to get plausible nonsense either way, I may as well run it on my own hardware.

isn't that literally all output an LLM generates?

[dead]