Yeah, I had written the blog to wrap my head around the idea of 'how would someone even be printing Weights on a chip?' 'Or how to even start to think in that direction?'.
I didn't explore the actual manufacturing process.
Frankly the most critical question is if they can really take shortcuts on DV etc, which are the main reasons nobody else tapes out new chips for every model. Note that their current architecture only allows some LORA-Adapter based fine-tuning, even a model with an updated cutoff date would require new masks etc. Which is kind of insane, but props to them if they can make it work.
From some announcements 2 years ago, it seems like they missed their initial schedule by a year, if that's indicative of anything.
For their hardware to make sense a couple of things would need to be true:
1. A model is good enough for a given usecase that there is no need to update/change it for 3-5 years. Note they need to redo their HW-Pipeline if even the weights change.
2. This application is also highly latency-sensitive and benefits from power efficiency.
3. That application is large enough in scale to warrant doing all this instead of running on last-gen hardware.
Maybe some edge-computing and non-civilian use-cases might fit that, but given the lifespan of models, I wonder if most companies wouldn't consider something like this too high-risk.
But maybe some non-text applications, like TTS, audio/video gen, might actually be a good fit.
TTS, speech recognition, ocr/document parsing, Vision-language-action models, vehicle control, things like that do seem to be the ideal applications. Latency constraints limit the utility of larger models in many applications.
Yeah, I had written the blog to wrap my head around the idea of 'how would someone even be printing Weights on a chip?' 'Or how to even start to think in that direction?'.
I didn't explore the actual manufacturing process.
You should add an RSS feed so I can follow it!
I don't post blogs often, so haven't added RSS there, but will do. I mostly post to my linkblog[1], hence have RSS there.
[1] https://www.anuragk.com/linkblog
Frankly the most critical question is if they can really take shortcuts on DV etc, which are the main reasons nobody else tapes out new chips for every model. Note that their current architecture only allows some LORA-Adapter based fine-tuning, even a model with an updated cutoff date would require new masks etc. Which is kind of insane, but props to them if they can make it work.
From some announcements 2 years ago, it seems like they missed their initial schedule by a year, if that's indicative of anything.
For their hardware to make sense a couple of things would need to be true: 1. A model is good enough for a given usecase that there is no need to update/change it for 3-5 years. Note they need to redo their HW-Pipeline if even the weights change. 2. This application is also highly latency-sensitive and benefits from power efficiency. 3. That application is large enough in scale to warrant doing all this instead of running on last-gen hardware.
Maybe some edge-computing and non-civilian use-cases might fit that, but given the lifespan of models, I wonder if most companies wouldn't consider something like this too high-risk.
But maybe some non-text applications, like TTS, audio/video gen, might actually be a good fit.
TTS, speech recognition, ocr/document parsing, Vision-language-action models, vehicle control, things like that do seem to be the ideal applications. Latency constraints limit the utility of larger models in many applications.