Hacker News

sempron64 17 hours ago [ - ]

The pelican has looked very same-y across all frontier models, same color bike, same camera angle, etc. I suspect this challenge is already too embedded in the training data to be a good signal when it succeeds, and maybe even when it fails in pathological ways mirroring existing AI pelicans on the internet.

tripleee 17 hours ago [ - ]

I'd say it's working great for its intended purpose. Keeps Simon on top of all these threads and funnels traffic to his site.

yreg 16 hours ago [ - ]

I really don't understand what's interesting about this test and why is it always on top.

simonw 16 hours ago [ - ]

It's funny.

girvo 11 hours ago [ - ]

It really is lol

mrandish 4 hours ago [ - ]

As often happens with random oddball things which become traditions in web communities, the replies asking what it is or complaining about it, begin to gain their own humor value.

depr 16 hours ago [ - ]

Same reason you would always see the same top comments on reddit during a certain era.

yreg 8 hours ago [ - ]

That’s what I think too, but we should actively go against such culture here because hn is not reddit.

gunsle 5 hours ago [ - ]

It basically is at this point, if you haven’t noticed. Complete with the same America bad, Elon bad, democrats good midwit progressive politics.

replwoacause 5 hours ago [ - ]

Elon does suck. Objectively.

anhner 4 hours ago [ - ]

Don't forget EU bad! Because they won't let Apple screw over consumers.

ankit_mishra 4 hours ago [ - ]

Is this Straw Man and Ad Hominem ?

luqtas 10 hours ago [ - ]

because you can't still ask LLMs to port DOOM to hardware X or Y

WithinReason 15 hours ago [ - ]

It's a meme, and HN loves upvoting memes. Just like Reddit!

port11 15 hours ago [ - ]

The ultimate measure of an LLM is whether it can produce a capable image of a pelican riding a bicycle. All other use cases are but a distraction!

scrollaway 16 hours ago [ - ]

Do you seriously have a dedicated “bad takes on AI” hn account?

tripleee 16 hours ago [ - ]

yeah, although I do combine it with "replies to snarky questions" for efficiency

jurgenaut23 16 hours ago [ - ]

True that

Fuzzwah 7 hours ago [ - ]

The "big beak!" comment in the svg source makes me think it's definitely a gamed "benchmark" at this point.

h4ny 16 hours ago [ - ]

Was it ever a good test? How do you even objectively assess what a good pelican on a bike is anyway?

fwipsy 16 hours ago [ - ]

SVG generation is a good test because it's extremely easy to subjectively assess with visual reasoning where humans are strong. However, pelican on a bike specifically may be overused at this point.

kayge 14 hours ago [ - ]

Do you think the models are ready for the next level? I believe that would be: Pelican feeding Spaghetti to Will Smith.

stratos123 13 hours ago [ - ]

I'd be very surprised if this is in the training data given that most models mess it up to this day. E.g. look at the ones from Opus.

quantumwoke 16 hours ago [ - ]

Variations of this comment have been posted for over a year. The pelican has now morphed into part of HN culture rather than a legitimate benchmark, but it's still valuable as a meme.

brazukadev 15 hours ago [ - ]

it is more an example of gaming (the HN system) than meme.