If this is accurate it raises the question: why is this model so expensive? DeepSeek v4 Flash is 284B total/13B active, FP4/FP8 mixed, and only costs $0.14/$0.28 - even less from OpenRouter. Of course Gemini 3.5 Flash is most likely a better product, and therefore it can command a higher price from an economics perspective, but does this imply Google is taking roughly a 90% profit margin on inference? If so they're either very compute-limited or confident in the model and wanting to recoup training/fixed costs (or both).
Well, we use flash models extensively (both 2.5 and 3.1) and I cannot overstate this, google cannot fucking serve them without 503s 70% of the time on most days
I think it’s pure economics. Flash models are OP for the price, leads to too much demand, google cannot serve it. This is likely expensive to reduce load and hey, if it still makes money just keep the margin.
This is the reality of the premiums available from being in the lead by ~8 months on model building technicals.
Rumor is that GCP was happily selling compute to competitors. After all, under the hood, Google is closer to a federation than a corporation. The state of GCP doesn't care about the state of Gemini.
> Rumor is
It’s not a rumor - there are many public announcements about $B deals around compute for other Ai companies