The pelican has looked very same-y across all frontier models, same color bike, same camera angle, etc. I suspect this challenge is already too embedded in the training data to be a good signal when it succeeds, and maybe even when it fails in pathological ways mirroring existing AI pelicans on the internet.

I'd say it's working great for its intended purpose. Keeps Simon on top of all these threads and funnels traffic to his site.

I really don't understand what's interesting about this test and why is it always on top.

It's funny.

It really is lol

As often happens with random oddball things which become traditions in web communities, the replies asking what it is or complaining about it, begin to gain their own humor value.

Same reason you would always see the same top comments on reddit during a certain era.

That’s what I think too, but we should actively go against such culture here because hn is not reddit.

It basically is at this point, if you haven’t noticed. Complete with the same America bad, Elon bad, democrats good midwit progressive politics.

Elon does suck. Objectively.

Don't forget EU bad! Because they won't let Apple screw over consumers.

Is this Straw Man and Ad Hominem ?

because you can't still ask LLMs to port DOOM to hardware X or Y

It's a meme, and HN loves upvoting memes. Just like Reddit!

The ultimate measure of an LLM is whether it can produce a capable image of a pelican riding a bicycle. All other use cases are but a distraction!

Do you seriously have a dedicated “bad takes on AI” hn account?

yeah, although I do combine it with "replies to snarky questions" for efficiency

True that

The "big beak!" comment in the svg source makes me think it's definitely a gamed "benchmark" at this point.

Was it ever a good test? How do you even objectively assess what a good pelican on a bike is anyway?

SVG generation is a good test because it's extremely easy to subjectively assess with visual reasoning where humans are strong. However, pelican on a bike specifically may be overused at this point.

Do you think the models are ready for the next level? I believe that would be: Pelican feeding Spaghetti to Will Smith.

I'd be very surprised if this is in the training data given that most models mess it up to this day. E.g. look at the ones from Opus.

Variations of this comment have been posted for over a year. The pelican has now morphed into part of HN culture rather than a legitimate benchmark, but it's still valuable as a meme.

it is more an example of gaming (the HN system) than meme.