Hacker News

imtringued a year ago [ - ]

Isn't this a good reward function for RL? Take a codebase's test suite. Rip out a function, let the LLM rewrite the function, benchmark it and then RL it using the benchmark results.