Just look at deepseek V4, this preview model uses only 8 GB for 1M token KV cache(the context). It's insanely efficient already. It's just that most models that are coming out are barely catching up with technical breakthroughs. Deepseek are pioneers.
Unfortunately V4 is not trained for most real world usage, it is mainly for world general knowledge.