what is your setup for drawing pelican? Do you ask model to check generated image, find issues and iterate over it which would demonstrate models real abilities?
what is your setup for drawing pelican? Do you ask model to check generated image, find issues and iterate over it which would demonstrate models real abilities?
It's generally one-shot-only - whatever comes out the first time is what I go with.
I've been contemplating a more fair version where each model gets 3-5 attempts and then can select which rendered image is "best".
Try llm-consortium with --judging-method rank
I think it will make results way better and more representative of model abilities..
It would... but the test is inherently silly, so I'm still not sure if it's worth me investing that extra effort in it.