Hacker News

> So it’s just like, your opinion, man?

Yes.

That is how you empirically evaluate tools; not by reading stupid benchmarks. By actually using the tools, for hours and hours. Doing real work.

Did you try using it? For hours? Do you use qwen?

How about you tell us about your experience with your great 8B models that you use daily. What coding agent harness do you have then hooked up to? What context size can you get before they lose track of whats happening? Do you swap between models for different coding tasks?

Or, have you not, actually, even actually tried any of this stuff, yourself?