Hacker News

Subjectively I find Kimi is far "smarter" than the benchmarks imply, maybe because they game then less than US labs

I like Kimi too, but they definitely have some benchmark contamination: the blog post shows a substantial comparative drop in swebench verified vs open tests. I throw no shade - releasing these open weights is a service to humanity; really amazing.

rubymamis 5 months ago [ - ]

My impression as well!