compared to your test with GLM 5.1, this indeed looks off

https://xcancel.com/simonw/status/2041646779553476801

Yeah GLM 5.1 did an outstanding job on the possum - better than Opus 4.7 or GPT-5.4 and I think better than Gemini 3.1 Pro too.

But GLM 5.1 is a 1.51TB model, the Qwen 3.6 I used here was 17GB - that's 1/88 the size.

The point is in the relative difference between the Pelican vs "other" test for each model suggesting the Pelican is being treated special these days (could be as simple as being common in recent data), not the relative difference between the models on the "other" case in isolation.

Hoping this doesn't turn into a pelican-SVG back-and-forth: yesterday's GPT Image 2 thread ended up being three screenfuls of "I tried the prompt too" replies, and nothing on the model until you scroll past it. I appreciate the testing, and I know this sounds like fun police, but there's a pattern where well-known commenter + one-off vibe test + 1:1 sub-threads eats the whole discussion. It being fun makes it hard to push back on without looking picky.

You can collapse the pelican thread with the little [-] toggle at the top.

Why would you though?

And by the way: Thanks for relentlessly holding new models’ feet to the pelican SVG fire.

Because I want to read about Qwen, not someone's one-off vibe test followed by 1:1 conversations. (case in miniature here: which is the last comment in this thread that says something about Qwen? The root post. Is that fun policing? Yes, apologies.)

There's a bunch of useful information in my comment that's independent of the fact that it drew a pelican:

1. You can run this on a Mac using llama-server and a 17GB downloaded file

2. That version does indeed produce output (for one specific task) that's of a good enough quality to be worth spending more time checking out this model

3. It generated 4,444 tokens in 2min 53s, which is 25.57 tokens/s

Right, that is exactly what I meant by "the root post [had info about Qwen]" - you shouldn't feel I'm being critical of you or asking you to do anything different, at all. I admire you deeply and feel humbled* by interacting with you, so I really want that to be 100% clear, because this is the 2nd time I'm reading that it might be personal.

* er, that probably sounds strange, but I did just spend 6 weeks working on integrating the Willison Trifecta for my app I've been building for 2.5 years, and I considered it a release blocker. It's a simple mental model that is a significant UX accomplishment IMHO.

I like the pelican-bicycle test because it's pretty predictive of how the model does helping me with TikZ. And I hate writing TikZ.

Somewhat ironically - as of when I write this this tangent is dominating the size of this topic.

I understand your reasoning and it's valid, but I think the best you can do is indeed collapse the thread (not sure if any mobile clients do better than that?)

It's perhaps not a serious test, it isn't to me, but on the edges of jokes about pelicans they're usually some useful things people smarter than me say, and additionally if providers are spending some time on making pelicans or svg look better, this benefits all of us.

So, no hard feelings, you're understood (and I'm not trying to be patronising, I'm just awkward with the language), but pelicans are here to stay because it seems that the consensus is they're beneficial and on topic.

All the best!

[deleted]

I think it's to help drive traffic to his blog now that he's accepted sponsors in the header of every page. I do see this pelican thing come up from him on every model post that gets released.

The traffic I get from a comment with a link to a pelican is pretty tiny.

"Create me an SVG to drive MAXIMUM ENGAGEMENT for my sponsors".

Missing an opportunity here, lol.