Cool hack but 0.5 tok/s on 70B when a 7B does 30+ on the same card. NVIDIA's own research says 40-70% of agentic tasks could run on sub-10B models and the quality gap has closed fast.

[flagged]

Can we not? Make a valiant effort to rephrase.