It’s relatively simple to use llama.cpp/server to spin up a local LLM to work with Claude Code or Codex-CLI. The required llama server settings are often scattered all over so I maintain a set of instructions here for several popular open LLMs:
https://pchalasani.github.io/claude-code-tools/integrations/...
Do you use that as a daily driver? Claude Code' prompt is huge and causes you to spend a long, long time on prompt processing for local models, then running out of context shortly after.
Yes CC prompt can be ~30K tokens. I definitely do not use this as a daily driver. I did use it a few times for sensitive document work with Qwen3.6 MOE.