Isn't training cost a function of inference cost? From what I gathered, they reduced both.

I remember seeing lots of videos at the time explaining the details, but basically it came down to the kind of hardware-aware programming that used to be very common. (Although they took it to the next level by using undocumented behavior to their advantage.)

They're typically somewhat related but the difference between training and inference can vary greatly so, i guess the answer is no.

they did reduce both though and mostly due to reduced precision