How long is foreseeable future? In 10 years I think LLM accelerator (GPU/NPU/etc) with 100 GB VRAM will cost under 2000 USD.

VRAM prices have remained flat for the last decade, so no evidence of that coming.

Beyond that, running inference on the equivalent of a 2025 SOTA model with 100GB of VRAM is very unlikely. One consistent quality of transformer models has been the fact that smaller and quantized models are fundamentally unreliable, even though high quality training data and RL can boost the floor of their capabilities.

GDDR6 8Gb spot (DRAMExchange) is now around 2.6 USD, down from 3.5 USD in summer 2023, and 6 USD in summer 2022? Last year has been pretty flat though!