I feel like the models have no moat paradigm died when a single model expanded past the memory of single GPU slices. The moat is hosting the model. Even paying a server host to run a rack of GPUs has immense upstart cost, and then you're still struggling to compete on the add-ons of the things on top of the model (prompts, validation loops, etc). You can only throw so much money at a problem.

Many different companies host the open source models. Where's the moat there?

I mean you got me there. There will be places who do have the means to build up massive GPU servers. There's just a lot more to it and I don't know if we're going to pinpoint an exact catch-all moat.