Hacker News

new | ask | show | jobs

-_- 3 days ago [ - ]

There needs to be some way of automatically assessing performance on the task, though this could be with a Python function or another LLM as a judge (or a combination!)