Shameless plug: https://huggingface.co/spaces/smolagents/ml-intern

It’s a simple harness around Opus, but with tight integration to Hugging Face infra, so the agent can read papers, test code and launch experiments

What are the benchmarks for this, in terms of costs of computation and error; cost to converge?

Re: hyperparameter tuning and autoresearch: https://news.ycombinator.com/item?id=47444581

Parameter-free LLMs would be cool