Actually, I find the idea of using Cerebras etc. for /training/ (not just inference) surprising: I did not stumble in much data and discussion about "super-CPUs" in that area, where NVidia (with the tools focused on it) has that long-built edge...
Edit: contextually,
> Jalapeño is specifically designed for inference