It's pretty good! I peaked a little bit under the hood and most of the size is going into the 3D models and secondly the sound effects. The largest single asset is the looping music track which is 2.3 MB.

The 3D models are compressed with 'Draco' compression. So for example the largest asset is a model of the entire world which is 333 kB. It consists of 81k vertices. Unoptimized that would mean at least 3 32 bit floats per vertex, which would be at least 972 kB uncompressed. So Draco compression is doing a pretty good job getting the model sizes down.

I'd have to dig into it more but I suspect if they wanted to they could have trimmed the size down quite a bit, but it may not be worth the effort. They could have used more repeat 3D models, but instead it looks like they went for a more artist driven look where most of the world is uniquely modeled.

It also looks like they're sending a bunch of lower res levels of detail per model over the network. That also is a tradeoff. Still with network speeds as they are nowadays it may be faster / simpler to send those over the wire each time instead of regenerating the level of detail.

The textures are a similar story. Well compressed but they're sending procedural noise over the network. Those could be generated but it's also trivial to send them over the network.