Hacker News

It’s a simple harness around Opus, but with tight integration to Hugging Face infra, so the agent can read papers, test code and launch experiments

What are the benchmarks for this, in terms of costs of computation and error; cost to converge?

Parameter-free LLMs would be cool