Hacker News

Good call! The latest forge version has per-model-parameter configs sourced from official sources (can be overridden), that's what I'll use for evals and each eval set will be paired with a commit hash. But I'll make sure to call out the location of the params and maybe highlight some for the popular models.

For the paper - more academic in nature - I wanted to isolate the model performance variable from guardrail lift. The delta is what mattered more than final score. For the paper, everyone got temp=0.7 - that was intentional.

As for Qwen3.6, it's really solid. It'll do really well on forge I can call that now. When I pushed it into agentic coding specifically and the eval suite I use there (separate from forge), even it needed help on long-running tasks - but it's definitely a top model right now.

However, entirely possible there are better settings than the "official recommendations" I found - which would be a neat finding in itself.