You don't even need MB of training data for some ML applications. AI is the sexy thing nowadays, but neural networks (Torch is a NN library) are generally useful for even small regression and clarification problems.
For some problems you might even be able to get away with single digit numbers of training points (classic example of this regime being Physics-Informed Neural Networks)
Yeah, our handful of models we just commit to the git repo--usually only a few MB.
Image still ends up being like 6-8Gi tho. iirc pytorch had a hard dependency on CUDA libs which pulled in a bunch of different hardware-specific kernel binaries. The models ran on CPU and didn't even need CUDA but it was incredibly hard to remove them--there was some pytorch init code that expected the CUDA crap to exist even on CPU-only.