Hacker News

wmf a day ago [ - ]

Good models will require multiple Taalas chips but Groq and Cerebras also require a lot of chips and that hasn't stopped them.

ipdashc 6 hours ago [ - ]

> Good models will require multiple Taalas chips

I guess that makes sense. Is this feasible, or does the added latency between chips kill any of the performance gains?

wmf 4 hours ago [ - ]

Using multiple chips seems to work fine for Cerebras and Groq so it should also work for Taalas. It does sounds challenging to reach >10K tok/s but latency could be below 1 us which is a small part of the token budget.