Hacker News

Same, my project has a step that selects between many options when a user is trying to do some tasks. The test set for the workflow that supports this has a better success rate by about 7% on gpt-4.1-mini vs gpt-5 and gpt-5-mini (with minimal thinking)