> If AI makes software easier to create, that will drive the price down.
Supposedly AI drives down the cost of producing software,not the "price".
> How are software companies going to make enough revenue to pay for AI, when the amount of money being spent on AI is already multiples of the current total global expenditure on software?
Currently, the cost of AI is between $20/month and around $200/month per developer.
I think the huge billions you're seeing in the news are the investment cost on AI companies, who are burning through cash to invest in compute infrastructure to allow both training and serving users.
> This demand for RAM is built on a foundation of sand, there will be a glut of capacity when it all shakes out.
Who knows? What I know is that I need >64GB of RAM to run local models, and that means most people will need to upgrade from their 8Gb/16GB setup to do the same. Graphics cards follow mostly the same pattern.
You need >64 GB of DRAM to run local models fast.
You can run huge local models slowly with the weights stored on SSDs.
Nowadays there are many computers that can have e.g. 2 PCIe 5.0 SSDs, which allow a reading throughput of 20 to 30 gigabyte per second, depending on the SSDs (or 1 PCIe 5.0 + 1 PCIe 4.0, for a throughput in the range 15-20 GB/s).
There are still a lot of improvements that can be done to inference back-ends like llama.cpp to reach the inference speed limit determined by the SSD throughput.
It seems that it is possible to reach inference speed in the range from a few seconds per token to a few tokens per second.
That may be too slow for a chat, but it should be good enough for an AI coding assistant, especially if many tasks are batched, so that they can progress simultaneously during a single read pass over the SSD data.
You can do that, but you're going to have rather low throughput unless you have lots of PCIe lanes to attach storage to. That's going to require either a HEDT or some kind of compute cluster.
Batching inferences doesn't necessarily help that much since as models get sparser the individual inferences are going to share fewer experts. It does always help wrt. shared routing layers, of course.
I got a 128 GB MBP, and the current models are fit enough to manage the calendar or do research on web (very slowly), not to be useful companions for coding as I hoped.
> Who knows? What I know is that I need >64GB of RAM to run local models, and that means most people will need to upgrade from their 8Gb/16GB setup to do the same. Graphics cards follow mostly the same pattern.
Depends how big the models are, how fast you want them to run and how much context you need for your usage. If you're okay with running only smaller models (which are still very capable in general, their main limitation is world knowledge) making very simple inferences at low overall throughput, you can just repurpose the RAM, CPUs/iGPUs and storage in the average setup.