I use a 500 million parameter model for editor completions because I want those to nearly instantaneous and the plugin makes 50+ completion requests every session.

What editor do you use, and how did you set it up? I've been thinking about trying this with some local models and also with super low-latency ones like Gemini 2.5 Flash Lite. Would love to read more about this.

Neovim with the llama.cpp plugin and heavily quantized qwen2.5-coder with 500 (600?) million parameters. It's almost plug and play although the default ring context limit is way too large if you don't have a GPU.

Can you share which model you are using?

Which model and which plugin, please?