In a quick test using a few of my standard test prompts a few observations: 1) the trace was very verbose and written in an odd style reminiscent of chat or those annoying one-sentence-per-paragraph LinkedIn posts; 2) token output rate is very high on the hosted version; 3) conformance to instructions and output quality was better than most of the leading models I've tested (e.g. Opus 4.5)