You mean autotune? I think 10 minutes is pretty normal, torch.compile('max-autotune') can be much slower than that for large models.
You mean autotune? I think 10 minutes is pretty normal, torch.compile('max-autotune') can be much slower than that for large models.
Add to that it can be done only once by developers before distribution for major hardware. Configs saved. Then on client side selected.