I think it's different from improving the model weights themselves, like the distillation examples you are mentioning. It's that changes to the "harness" or code running around the llm calls (which is what this is editing), persist or generalize to wrapping more powerful llms. That means they aren't all wasted when a more powerful llm comes along that the harness wasn't tuned to use.