> and there’s zero chance any AI lab would train a model for such a ridiculous task.

I'm not sure that's true anymore considering how popular Simon's blog is

> So maybe the AI labs have been paying attention after all!

> I think this mainly demonstrates that the pelican on the bicycle has firmly exceeded its limits as a useful benchmark.

As acknowledged in the article.

Gemini 3.1 basically takes it home on that benchmark, anyway, it's done.

Simon mentions further along in his article that given Jeff Dean’s post referencing the pelican-riding-a-bike task (and how good current models are at doing it), that it’s no longer a great benchmark to use. Enter the opossum riding an e-scooter!

Banana man on the Segway

That bit probably works better in the talk, it was a setup for a joke later on.