Hacker News

Yea, it's pretty clear you're loudmouthed and an aggressively arrogant know-it-all (at least you think). You keep moving the goalposts too. First you're acting like they can't run models that don't fit in 44GB or 4x44GB. Then you say they can "only" run a larger model at 500 tps but that wouldn't be profitable.. Lol

Cerebras CURRENTLY serves GLM-4.7. I've used it through their API. Look up how big it is. 1,000-1,700 tps. https://www.cerebras.ai/blog/glm-4-7

Not interested in further conversation, so have a nice day! You can go ahead and get in the last word though.