I think the idea is that the images are decrypted by the client. See how Ente does it: https://ente.com/architecture

Of course - this sacrifice quite a bit of functionality since more or less all functions which require looking at the pixels need to be client-side. But to be fair - the client is part of the "app", so it's not "just" encrypted storage.