Where are you getting the 1 Gigabyte number from?
Their 1-bit quantized Diffusion Transformer is just under 1 GB. You also need the text-encoder (4-bit quantized) and VAE (unquantized) for inference and their combined weight is ~3.42 GB.
TBF, even at that size it's no less mind blowing.