Hacker News

> propose, implement, measure, keep the wins

Pretty much what I did to let Codex with gpt5.4xhigh improve my fairly complex CUDA kernel which resulted in 20x throughput improvement.

hackyhacky 2 days ago [ - ]

Concretely, what interesting changes did it make to achieve such a significant improvement?

osti 2 days ago [ - ]

A lot of it was beyond me, but this was all the branch names for all the stuff it tried, most of it unsuccessful of course. About 10x perf improvement came from architectural changes, and then 2x from micro optimizations.

https://pastebin.com/eac0SAYg