They use machine learning to optimize general purpose chips. I am proposing that you would train an LLM AND the ultra-optimized hardware that can only run that LLM at the same time. So the LLM and the Verilog design of the hardware to run it would be the output of the training