Even with 10x more efficient models and GPU compute, hundreds of GB of VRAM will still be on the order of tens of thousands of USD for the foreseeable future.
Even with 10x more efficient models and GPU compute, hundreds of GB of VRAM will still be on the order of tens of thousands of USD for the foreseeable future.
How long is foreseeable future? In 10 years I think LLM accelerator (GPU/NPU/etc) with 100 GB VRAM will cost under 2000 USD.
VRAM prices have remained flat for the last decade, so no evidence of that coming.
Beyond that, running inference on the equivalent of a 2025 SOTA model with 100GB of VRAM is very unlikely. One consistent quality of transformer models has been the fact that smaller and quantized models are fundamentally unreliable, even though high quality training data and RL can boost the floor of their capabilities.
GDDR6 8Gb spot (DRAMExchange) is now around 2.6 USD, down from 3.5 USD in summer 2023, and 6 USD in summer 2022? Last year has been pretty flat though!