Hacker News

Since you've got the aider hack session going...

One thing I've had in the back of my brain for a few days is the idea of LLM-as-a-judge over a multi-armed bandit, testing out local models. Locally, if you aren't too fussy about how long things take, you can spend all the tokens you want. Running head-to-head comparisons is slow, but with a MAB you're not doing so for every request. Nine times out of ten it's the normal request cycle. You could imagine having new models get mixed in as and when they become available, able to take over if they're genuinely better, entirely behind the scenes. You don't need to manually evaluate them at that point.

I don't know how well that gels with aider's modes; it feels like you want to be able to specify a judge model but then have it control the other models itself. I don't know if that's better within aider itself (so it's got access to the added files to judge a candidate solution against, and can directly see the evaluation) or as an API layer between aider and the vllm/ollama/llama-server/whatever service, with the complication of needing to feed scores out of aider to stoke the MAB.

You could extend the idea to generating and comparing system prompts. That might be worthwhile but it feels more like tinkering at the edges.

Does any of that sound feasible?

It's funny you say this! I was adding a tool just earlier (that I haven't yet pushed) that allows the model to... switch model.

Aider can also have multiple models active at any time (the architect, editor and weak model is the standard set) and use them for different aspects. I could definitely imagine switching one model whilst leaving another active.

So yes, this definitely seems feasible.

Aider had a fairly coherent answer to this question, I think: https://gist.github.com/tekacs/75a0e3604bc10ea88f9df9a909b5d...

This was navigator mode + Gemini 2.5 Pro's attempt at implementing it, based only on pasting in your comment:

https://asciinema.org/a/EKhno9vQlqk9VkYizIxsY8mIr

https://github.com/tekacs/aider/commit/6b8b76375a9b43f9db785...

I think it did a fairly good job! It took just a couple of minutes and it effectively just switches the main model based on recent input, but I don’t doubt that this could become really robust if I had poked or prompted it further with preferences, ideas, beliefs and pushback! I imagine that you could very quickly get it there if you wished.

It's definitely not showing off the most here, because it's almost all direct-coding, very similar to ordinary Aider. :)