There's a table here showing some "Overall" and "Median" score, but no context on what exactly was tested. It appears to be in the ballpark as the latest models, but with some cost advantages with the downside of being just as slow as the original r1 (likely lots of thinking tokens). https://www.reddit.com/media?url=https%3A%2F%2Fpreview.redd....
It’s appeared on the Livecodebench leaderboard too. Performance on par with O4 Mini - https://livecodebench.github.io/leaderboard.html