> Rumors are worth squat
You can make some educated guesses and find out some limits on inferencing cost by looking at 3rd party providers on platforms like openrouter. You can get some median cost /tok for a given model size. Then make some educated guesses on SotA model sizes, and you can get an estimate on pure cost of serving a model. Error bars and all that, of course. But still a range, with some limits.
No, you can't really make educated guesses unless people start opening their books. Especially in an industry where the vast majority of firms make up valuations out of thin air and not based on any reproducible insights.
Opening their books would let you know things like profitability. I'm talking about cost per token, model development and human costs being irrelevant.
Yeah take the gpu rental cost, what it can run, how many tokens per second come out and see the true rate per token. Plus the margin on harness special sauce