We do. The Cerebras line of Wafer Scale Engines is exactly an entire wafer of cores running in parallel with fast memory next to each one. It's intended for very high throughput LLM inference. https://www.cerebras.ai/chip
We do. The Cerebras line of Wafer Scale Engines is exactly an entire wafer of cores running in parallel with fast memory next to each one. It's intended for very high throughput LLM inference. https://www.cerebras.ai/chip