I'm wondering about something different. FPGA seems ideal for an AI chip because you can simply flash the latest model. The downsides are low density and low clockspeed. It seems that you can only fit 100-300M parameters in even very large FPGA, but that seems like it would be enough for most finetuning.

I'm thinking of a situation where you do the initial model calculations in hardware on the Taalas chip then hand that off to the FPGA to do the LoRA subset of calculations in hardware that can be continuously re-tuned to keep the model up-to-date. This would probably reduce throughput (or at least increase latency), but would save tons of money by allowing you to use the chips longer.