The amount of video/imagery to make a million tokens vs the amount of text to do the same is a surprisingly low ratio. Did they have the same intuition?
The amount of video/imagery to make a million tokens vs the amount of text to do the same is a surprisingly low ratio. Did they have the same intuition?