> The main downside of this was memory usage. You would have to load all of your code and libraries N types and in-process caches would become less effective.

You can load modules and then fork child processes. Children will share memory with each other (if they need to modify any shared memory, they get copy-on-write pages allocated by the kernel) and you'll save quite a lot on memory.

Yes, this can help a lot, but it definitely isn't perfect. Especially since CPython uses reference counting it is likely that many pages get modified relatively quickly as they are accessed. Many other GC strategies are also pretty hostile to CoW memory (for example mark bits, moving, ...) Additionally this doesn't help for lazy loaded data and caches in code and libraries.