They use machine learning to optimize general purpose chips. I am proposing that you would train an LLM AND the ultra-optimized hardware that can only run that LLM at the same time. So the LLM and the Verilog design of the hardware to run it would be the output of the training
Can't find the reference now, but remember reading an article on evolving FPGA designs. The found optimum however only worked on the specific FPGA it was evolved on, since the algo had started to use some out-of-spec "features" of the specific chip. Obviously that can be fixed with proper constraints, but seems like a trap that could be stepped into again - i.e. the LLM is now really fast but only on GPUs that come from the same batch of wafers.
I think that's basically what nvidia and their competitor AI chips do now?
They use machine learning to optimize general purpose chips. I am proposing that you would train an LLM AND the ultra-optimized hardware that can only run that LLM at the same time. So the LLM and the Verilog design of the hardware to run it would be the output of the training
Can't find the reference now, but remember reading an article on evolving FPGA designs. The found optimum however only worked on the specific FPGA it was evolved on, since the algo had started to use some out-of-spec "features" of the specific chip. Obviously that can be fixed with proper constraints, but seems like a trap that could be stepped into again - i.e. the LLM is now really fast but only on GPUs that come from the same batch of wafers.
https://www.researchgate.net/publication/2737441_An_Evolved_...