Hacker News

mathiaspoint 3 days ago [ - ]

I use a 500 million parameter model for editor completions because I want those to nearly instantaneous and the plugin makes 50+ completion requests every session.

ghxst 3 days ago [ - ]

What editor do you use, and how did you set it up? I've been thinking about trying this with some local models and also with super low-latency ones like Gemini 2.5 Flash Lite. Would love to read more about this.

mathiaspoint 3 days ago [ - ]

Neovim with the llama.cpp plugin and heavily quantized qwen2.5-coder with 500 (600?) million parameters. It's almost plug and play although the default ring context limit is way too large if you don't have a GPU.

badlogic 3 days ago [ - ]

Can you share which model you are using?

myflash13 3 days ago [ - ]

Which model and which plugin, please?