Hacker News

Y

Hacker News

new | ask | show | jobs

djfergus 10 hours ago [ - ]

We need a benchmark that tests a models ability to do LLM research.