GPU compute per dollar has been on a pretty steady curve of around 10x per decade. In ML for computer vision we were also able to make models around 10x as efficient per 10 years. I think with these two factors combined, mapped to LLMs, we will be able go match the performance of say Sonnet 4 on a 2000 USD workstation well within 10 years from today.

Even with 10x more efficient models and GPU compute, hundreds of GB of VRAM will still be on the order of tens of thousands of USD for the foreseeable future.

How long is foreseeable future? In 10 years I think LLM accelerator (GPU/NPU/etc) with 100 GB VRAM will cost under 2000 USD.

VRAM prices have remained flat for the last decade, so no evidence of that coming.

Beyond that, running inference on the equivalent of a 2025 SOTA model with 100GB of VRAM is very unlikely. One consistent quality of transformer models has been the fact that smaller and quantized models are fundamentally unreliable, even though high quality training data and RL can boost the floor of their capabilities.

GDDR6 8Gb spot (DRAMExchange) is now around 2.6 USD, down from 3.5 USD in summer 2023, and 6 USD in summer 2022? Last year has been pretty flat though!