Hacker News

bjornsing a year ago [ - ]

Perhaps not entirely open domain, but I have high hopes for “real RL” in coding, where you can get a reward signal from compile/runtime errors and tests.

falcor84 a year ago [ - ]

Interesting, has anyone been doing this? I.e. training/fine-tuning an LLM against an actual coding environment, as opposed to just tacking that later on as a separate "agentic" contruct?

bjornsing a year ago [ - ]

I suspect that the big vendors are already doing it, but I haven’t seen a paper on it.