This benchmark outcome is actually really impressive given the difficulty of this task. It shows that this particular model manages to "think" coherently and maintain useful information in its context for what has to be an insane overall amount of tokens, likely across parallel "thinking" chains. Likely also has access to SVG-rendering tools and can "see" and iterate on the result via multimodal input.