One shotting is useful to test but only with a huge prompt (eg, build something according to this spec).

I agree generating millions of tokens from a handful of input tokens doesn't convey anything meaningful to me.