> “if we are constrained by performance and price, what architectural tradeoffs do we need to make?” a question that so far has not really been asked in the mad token gold rush.

To be fair, I think the labs are also interested in this (e.g OpenAI parameter golf). But the incentives are tricky. When the subsidies and tokenmaxxing era ends, local models will be essential.