I built a custom code-completion server: <https://github.com/toogle/mlx-dev-server>.

The key advantage is that it cancels generation when you continue typing, so invalidated completions don’t waste time. This makes completion latency predictable (about 1.5 seconds for me).

My setup: - MacBook Pro (M3 Max) - Neovim - https://github.com/huggingface/llm.nvim

Models I typically use: - mlx-community/DeepSeek-Coder-V2-Lite-Instruct-4bit-mlx - mlx-community/Qwen3-Coder-30B-A3B-Instruct-4bit