Hacker News

Alifatisk 5 months ago [ - ]

Can't wait for Artificial analysis benchmarks, still waiting on them adding Qwen3-max thinking, will be interesting to see how these two compare to each other

huey77 5 months ago [ - ]

The analysis is up! Impressive: https://artificialanalysis.ai/models/kimi-k2-thinking

Alifatisk 5 months ago [ - ]

Wow, these numbers are insanse! I tried it yesterday and it worked beautifully well. It also responded the way I wanted every time, I didn't have to spend time prompting it on how to respond properly (unlike Grok 4 expert, which tends to yap a lot), it just knew.

Todays models have gotten so good that at this point, whatever I run, just works and helps me in whatever. Maybe I should start noting down prompts that some models fails at.

osti 5 months ago [ - ]

Qwen 3 max has been getting rather bad reviews around the web (both on reddit and chinese social media), and from my own experience with it. So I wouldn't expect this to be worse.

SamDc73 5 months ago [ - ]

Also, my experience with it wasn't that good; but it was looking good on benchmarks ..

It seems benchmark maxing, what you do when you're out of tricks?

Alifatisk 5 months ago [ - ]

Ohhh, so Qwen3 235B-A22B-2507 is still better?

osti 5 months ago [ - ]

I wouldn't say that, but just that qwen 3 max thinking definitely underperforms relative to its size.

htrp 5 months ago [ - ]

Did the ArtificialAnalysis team get bored or something? What makes a model worthy of benchmark inclusion?