- The compute requirements would be massive compared to the rest of the industry
- Not a single large open source lab has trained anything over 32B dense in the recent past
- There is considerable crosstalk between researchers at large labs; notice how all of them seem to be going in similar directions all the time. If dense models of this size actually provided benefit compared to MoE, the info would've spread like wildfire.
There's no way Sonnet 4 or Opus 4 are dense models.
Citation needed
Common sense:
- The compute requirements would be massive compared to the rest of the industry
- Not a single large open source lab has trained anything over 32B dense in the recent past
- There is considerable crosstalk between researchers at large labs; notice how all of them seem to be going in similar directions all the time. If dense models of this size actually provided benefit compared to MoE, the info would've spread like wildfire.