That means their paper is actually worse than SOTA, which is concerned with training in fp4 natively without full precision [0] for QAT.

[0] "full precision" in ML usually means 16 bit floats like bfloat16

I wouldn't say "worse". It's focusing on inference cost and leaving training at a default for now.