figures 10 and 11 in the paper are interesting.

i suppose at a high level this works because it is much easier for the evaluator to generate tests with fuzzing than it is for the model to probe.

this method somehow clarifies the way in which code generation is curve fitting, where the output curve is some linear transformation of the inputs.

kind of satisfying that when all is said and done, and we have a machine that can fit curve descriptions as well as or better than humans, we won't be any closer to explaining how anything really works.