I got a personal Mac Studio M4 Max with 128GB RAM for a silent, relatively power-efficient yet powerful home server. It runs Ollama + Open WebUI with GPT-OSS 120b as well as GLM4.5-Air (default quantisations). I rarely ever use ChatGPT anymore. Love that all data stays at home. I connect remotely only via VPN (my phone enables this automatically via Tasker).

I'm 50% brainstorming ideas with it, asking critical questions and learning something new. The other half is actual development, where I describe very clearly what I know I'll need (usually in TODOs in comments) and it will write those snippets, which is my preferred way of AI-assistance. I stay in the driver seat; the model becomes the copilot. Human-in-the-loop and such. Worked really well for my website development, other personal projects and even professionally (my work laptop has its own Open WebUI account for separation).

I like your method of adding TODOs in your code, then using a model - I am going to try that. I only have a 32G M2 Mac so I have to use Ollama Cloud to run some of the larger models but that said I am surprised by what I can do ‘all local’ and it really is magical running all on my own hardware, when I can.

The TODOs really help me get my logic sorted out first in pseudocode. Glad to inspire someone else with it!

I've read that GPT-OSS:20b is still a very powerful model, I believe it fits in your Mac's RAM as well and could still be quite fast to output. For me personally, only the more complex questions require a better model than local ones. And then I'm often wondering if LLMs are the right tool to solve the complexity.