The implications for RLM is really interesting. RLM is expensive because of token economics. But when tokens are so cheap and fast to generate, context size of the model matters a lot less

Also interesting implications for optimization-driven frameworks like DSPy. If you have an eval loop and useful reward function, you can iterate to the best possible response every time and ignore the cost of each attempt