The interesting thing about models this small is they should be able to be put on a single Taalas chip (the HC1 already runs a Llama 3.1 8B model). We're already at the point where half-decent reasoning could be run on an ASIC (and at mind-boggling speeds).

Yeah, if they can fit an 8B model that's really good at improving the output by thinking, running at 16K tok/s on Taalas would be mind-blowing.

Given this and the quality of open models, it makes no sense to me that there’s a future for Anthropic et all?

Packaging a capability into a consumable form will still be business.

It's like web hosting; all the open source tools are there and free, and yet website tools, hosts, etc flourish.

It’s true, but hosting prices are still within spitting distance of rolling it yourself.

SOTA providers are expecting some level of margin. Companies everywhere have a tight eye on their AI bills right now.

The motivation is there if the models get good enough, even if it’s more painful.