> But, no, I'm given to understand they couldn't run e.e. the DeepSeek 3.2 model full size because there simply isn't enough GPU RAM still.
My RTX 4080 only has 16 GB of VRAM, and gpt-oss 120b is 4x that size. It looks like Ollama is actually running ~80% of the model off of the CPU. I was made to believe this would be unbearably slow, but it's really not, at least with my CPU.
I can't run the full sized DeepSeek model because I don't have enough system memory. That would be relatively easy to rectify.
> And once it does add up, and these models can be reasonable run on lower end hardware... then the moat ceases to exist and there'll be dozens of providers.
This is a good point and perhaps the bigger problem.