Hacker News

postalcoder 2 days ago [ - ]

I don’t see how you can make these claims without having your own evals and running these models yourself. The gpt-oss results i’m getting for my use case, which is agentic task execution for a wide variety of tasks on my local device are spectacular, even more so when you stack them up against every model in the 20B weight class.

jermaustin1 2 days ago [ - ]

That's what I've been feeling too. But it is just a feeling. I'm not running any benchmarks.

My agentic coding "app" (basically just a tool "server" around dotnet/git/fs commands with a kanban board) seems to be able to spit out quick SPAs with little additional prompting.