antirez's ds4-agent works quite fine. It runs on any Apple Silicon device with 96GB RAM or more.

I wonder how many years it'll take for the API token cost to exceed the money spent on ram.

The DS4 folks are unofficially testing ways to run the model with lower performance on lower-RAM machines. Similar efforts are going on with llama.cpp. The results are a bit of a challenge, prefill time tends to explode which is a limitation if you care about agentic workflows.