Y
Hacker News
new
|
ask
|
show
|
jobs
djfergus
10 hours ago
[
-
]
We need a benchmark that tests a models ability to do LLM research.
Please enable JavaScript to continue using this application.