By now, model lifetime inference compute is >10x model training compute, for mainstream models. Further amortized by things like base model reuse.
By now, model lifetime inference compute is >10x model training compute, for mainstream models. Further amortized by things like base model reuse.