I just finished updating the aider polyglot leaderboard [0] with GPT-4.1, mini and nano. My results basically agree with OpenAI's published numbers.

Results, with other models for comparison:

    Model                       Score   Cost

    Gemini 2.5 Pro Preview 03-25 72.9%  $ 6.32
    claude-3-7-sonnet-20250219   64.9%  $36.83
    o3-mini (high)               60.4%  $18.16
    Grok 3 Beta                  53.3%  $11.03
  * gpt-4.1                      52.4%  $ 9.86
    Grok 3 Mini Beta (high)      49.3%  $ 0.73
  * gpt-4.1-mini                 32.4%  $ 1.99
    gpt-4o-2024-11-20            18.2%  $ 6.74
  * gpt-4.1-nano                  8.9%  $ 0.43
Aider v0.82.0 is also out with support for these new models [1]. Aider wrote 92% of the code in this release, a tie with v0.78.0 from 3 weeks ago.

[0] https://aider.chat/docs/leaderboards/

[1] https://aider.chat/HISTORY.html

Did you benchmarked combo: DeepSeek R1 + DeepSeek V3 (0324)? There is combo on 3rd place : DeepSeek R1 + claude-3-5-sonnet-20241022 and also V3 new beating claude 3.5 so in theory R1 + V3 should be even on 2nd place. Just curious if that would be the case

What model are you personally using in your aider coding? :)

Mostly Gemini 2.5 Pro lately.

I get asked this often enough that I have a FAQ entry with automatically updating statistics [0].

  Model               Tokens     Pct

  Gemini 2.5 Pro   4,027,983   88.1%
  Sonnet 3.7         518,708   11.3%
  gpt-4.1-mini        11,775    0.3%
  gpt-4.1             10,687    0.2%
[0] https://aider.chat/docs/faq.html#what-llms-do-you-use-to-bui...