Good models will require multiple Taalas chips but Groq and Cerebras also require a lot of chips and that hasn't stopped them.

> Good models will require multiple Taalas chips

I guess that makes sense. Is this feasible, or does the added latency between chips kill any of the performance gains?

Using multiple chips seems to work fine for Cerebras and Groq so it should also work for Taalas. It does sounds challenging to reach >10K tok/s but latency could be below 1 us which is a small part of the token budget.