I think when somebody trains code golfing LLMs with reinforcement learning they will inadvertently be smarter