You're just moving the goal post & not addressing the question I asked. Why isn't AI optimizing the kernels in its own code the way people have been optimizing it like in the posted paper?
I read the paper. All the prerequisites are already available in existing literature & they basically profiled & optimized around the bottlenecks to avoid pipeline stalls w/ instructions that utilize the available tensor & CUDA cores. Seems like something these super duper AIs that don't get tired should be able to do pretty easily.
I also wouldn't be surprised if they used AI to assist themselves in small ways
You're just moving the goal post & not addressing the question I asked. Why isn't AI optimizing the kernels in its own code the way people have been optimizing it like in the posted paper?
They do?
https://www.deeplearning.ai/the-batch/alphatensor-for-faster...
https://deepmind.google/blog/alphaevolve-a-gemini-powered-co...
https://www.rubrik.com/blog/ai/25/teaching-ai-to-write-gpu-c...
It will, right after it reads the paper.
I read the paper. All the prerequisites are already available in existing literature & they basically profiled & optimized around the bottlenecks to avoid pipeline stalls w/ instructions that utilize the available tensor & CUDA cores. Seems like something these super duper AIs that don't get tired should be able to do pretty easily.