Pure performance isn't necessarily everything. Context window, speed and local use are just some of the upsides this model may have. We still know next to nothing so anything is possible, but if it is an MoE at 120B, that could enable some interesting local use cases, even if it less capable than e.g. Deepseek V3, simply by running on more hardware/at higher tokens/sec. GPT-4.1s code focus has also shown that OpenAI does have a knack for models with a more narrow use case, maybe this will do well in specific tasks. More so since GPT-4.1 was that much better than the massive GPT-4.5, I am cautiously optimistic.

Even if it does poorly in all areas (like Llama 4 [0]), there is still a lot the community and industry can learn from even an uncompetitive model.

[0] Llama 4 technically has a massive 10M token context as a differentiator, however in my experience, it is not reliably usable beyond 100k.