OK that is recognizably a pelican, pretty great!
This feels like the best pelicanbike yet. The singularity might be closer than we imagine.
Time for a leaderboard?
Ask and you'll receive: https://pelicans.borg.games/
It would be interesting to have two generations per model without cherry picking, so that the Elo estimation can include an easy-to-compute standard deviation estimation.
Nice! Is there a way I can click on the leaderboard items so I can view them?
Added
Lol, can you add a "both of these are terrible" option?
That's what "Reload Pelicans" is for!
I think they (LLMs providers) are manually tuning these cases/examples.
Pelinkan on a bike - > some dude (from these labs) creates it, and it becomes part of the training data.
This feels like the best pelicanbike yet. The singularity might be closer than we imagine.
Time for a leaderboard?
Ask and you'll receive: https://pelicans.borg.games/
It would be interesting to have two generations per model without cherry picking, so that the Elo estimation can include an easy-to-compute standard deviation estimation.
Nice! Is there a way I can click on the leaderboard items so I can view them?
Added
Lol, can you add a "both of these are terrible" option?
That's what "Reload Pelicans" is for!
I think they (LLMs providers) are manually tuning these cases/examples.
Pelinkan on a bike - > some dude (from these labs) creates it, and it becomes part of the training data.