1. Context is all you need... They are heavily investing in getting better context (especially for coding tasks). This will disproportionately advantage smaller models (and benefit everyone).

A smaller model with better context today can outperform a model with 100x more parameters with bad or diluted context.

2. MoE (already abundant) + MLA (mostly memory efficiency, not quality) + Medusa (speed, not quality) + GRAM (5000-10,000x better reasoning in an extremely small model) + 1.58b (unclear if it will have the impact Microsoft first claimed - but possibly 5x).