Adding to my own comment now that I've read the announcement in a little more detail: I find the assertion that the model's coding performance surpasses their own flagship 397B model from last generation fairly convincing.
This sounds like significant genuine gains unless one of the following is true, which would be really unlikely:
1. They somehow managed to benchmaxx every coding benchmark way harder than their own last generation.
2. They held back the coding performance of their last generation 397B model on purpose to make this 3.6 Qwen model look good. (basically a tinfoil hat theory as it would literally require 4D chess and self-harming to do)
So, it's pretty save to say that we actually have a competent agentic coding model we can leave on in a prosumer laptop overnight to create real software for almost zero token costs.