Surprised Gemini 3.1 Pro beat Claude in your evals for code-gen. Any intuition why - spatial reasoning, or just cleaner OpenSCAD output?
Surprised Gemini 3.1 Pro beat Claude in your evals for code-gen. Any intuition why - spatial reasoning, or just cleaner OpenSCAD output?
Gemini 3.1 whilst not the best agentic coding model, has extremely strong vision (which makes it reason spatially very well).
Fable 5 was top for a brief moment, whilst it was around!