Genuine question: doesn't it blow your mind that there exists a 1 Gigabyte file/program that can generate any image you can think of just from a rough description of it?

Where are you getting the 1 Gigabyte number from?

Their 1-bit quantized Diffusion Transformer is just under 1 GB. You also need the text-encoder (4-bit quantized) and VAE (unquantized) for inference and their combined weight is ~3.42 GB.

TBF, even at that size it's no less mind blowing.

Yeah, it's pretty incredible. And I guess that's mostly what's behind the question: whether this is more of an impressive research/technique demonstrator, or a real product advancement solving a need.

[dead]

> doesn't it blow your mind that there exists a 1 Gigabyte file/program that can generate any image you can think of just from a rough description of it?

I can make this into a 5-lines Python program. I’m not saying the images will match the description, but that isn’t part of your spec ;)