Hacker News

andriy_koval 19 hours ago [ - ]

what is your setup for drawing pelican? Do you ask model to check generated image, find issues and iterate over it which would demonstrate models real abilities?

simonw 19 hours ago [ - ]

It's generally one-shot-only - whatever comes out the first time is what I go with.

I've been contemplating a more fair version where each model gets 3-5 attempts and then can select which rendered image is "best".

irthomasthomas 19 hours ago [ - ]

Try llm-consortium with --judging-method rank

andriy_koval 19 hours ago [ - ]

I think it will make results way better and more representative of model abilities..

simonw 19 hours ago [ - ]

It would... but the test is inherently silly, so I'm still not sure if it's worth me investing that extra effort in it.