Well, I seen one suggestion for probable nearest future of AI, to stop on GPT-4 level, but distill and use optimizations like switch to FP8 for faster speed.
So basically idea, model with very same capabilities, but distilled and optimized to have less size and faster inference.
I cannot promise 100x speedup, but I think 10x is very real.