Oh, very glad to see this, ML applications that were mentioned in it are exactly why I was thinking this was such a disastrous change.
However, the tedium of the reply chain reminds me why I tend to focus most energy on internal projects rather than external open source...
Docker may have been built for a specific type of use case that most developers are familiar with (e.g. web apps backed by a DB container) but containerization is useful across so much of computing that are very different. Something that seems trivial in the python/DB space, having one or two different small duplicates of OS layers, is very different once you have 30 containers for different models+code, and then ~100 more dev containers lying around as build artifacts from building and pushing, and pulling, each at ~10GB, that the inefficient new system is just painful.
The smallest PyTorch container I ever built was 1.8GB, and that was just for some CPU-only inference endpoints, and that took several hours of yak shaving to achieve, and after a month or two of development it had ballooned back to 8GB. Containers with CUDA, or using significant other AI/ML libraries, get really big. YAGNI is a great principle for your own code when writing from scratch, but YAGNI is a bit dangerous when there's been an entire ecosystem built on your product and things are getting rewritten from scratch, because the "you" is far larger than the developer making the change. Docker's core feature has always been reusable and composable layers, so seeing it abandoned seems that somebody took YAGNI far too extreme on their own corner of the computing world.
Docker(Hub) just isn't built for this use case. I've built https://clipper.dev to better handle ML/large images. It consists of a registry+pull client that breaks apart layers and does content addressing of individual chunks by _uncompressed_ hash, so that content can be better shared. My pull client has better parallelization and wastes much less bandwidth. It annoys the heck out of me when I change one file in a layer and have to redownload bytes my device already has. By sharing across layers I've seen 80-90% improvements in pull times for "patches".
I'm also in the process of building a BuildKit builder, I'm seeing large improvements on the speed of exporting images. The same image that takes Docker >3 minutes to export and push takes me under a minute. https://github.com/clipper-registry/benchmarks/actions/runs/...
Oh, very glad to see this, ML applications that were mentioned in it are exactly why I was thinking this was such a disastrous change.
However, the tedium of the reply chain reminds me why I tend to focus most energy on internal projects rather than external open source...
Docker may have been built for a specific type of use case that most developers are familiar with (e.g. web apps backed by a DB container) but containerization is useful across so much of computing that are very different. Something that seems trivial in the python/DB space, having one or two different small duplicates of OS layers, is very different once you have 30 containers for different models+code, and then ~100 more dev containers lying around as build artifacts from building and pushing, and pulling, each at ~10GB, that the inefficient new system is just painful.
The smallest PyTorch container I ever built was 1.8GB, and that was just for some CPU-only inference endpoints, and that took several hours of yak shaving to achieve, and after a month or two of development it had ballooned back to 8GB. Containers with CUDA, or using significant other AI/ML libraries, get really big. YAGNI is a great principle for your own code when writing from scratch, but YAGNI is a bit dangerous when there's been an entire ecosystem built on your product and things are getting rewritten from scratch, because the "you" is far larger than the developer making the change. Docker's core feature has always been reusable and composable layers, so seeing it abandoned seems that somebody took YAGNI far too extreme on their own corner of the computing world.
Docker(Hub) just isn't built for this use case. I've built https://clipper.dev to better handle ML/large images. It consists of a registry+pull client that breaks apart layers and does content addressing of individual chunks by _uncompressed_ hash, so that content can be better shared. My pull client has better parallelization and wastes much less bandwidth. It annoys the heck out of me when I change one file in a layer and have to redownload bytes my device already has. By sharing across layers I've seen 80-90% improvements in pull times for "patches".
I'm also in the process of building a BuildKit builder, I'm seeing large improvements on the speed of exporting images. The same image that takes Docker >3 minutes to export and push takes me under a minute. https://github.com/clipper-registry/benchmarks/actions/runs/...
> ...containerization is useful across so much of computing that are very different....
So much that containerization in general predates Linux, and UNIX, all the way back to System 360.
Also it got introduced into Tru64, HP-UX, BSD and Solaris, before landing into Linux.