Yeah, our handful of models we just commit to the git repo--usually only a few MB.
Image still ends up being like 6-8Gi tho. iirc pytorch had a hard dependency on CUDA libs which pulled in a bunch of different hardware-specific kernel binaries. The models ran on CPU and didn't even need CUDA but it was incredibly hard to remove them--there was some pytorch init code that expected the CUDA crap to exist even on CPU-only.