I independently invented a very similar method and then abandoned it because it relies on abstraction.

Instead I now use Damerau-Levenshtein distance to assert the edits to be replaced and if the similarity is over some threshold the edit goes through

Really works well because it's explicit. Forcing the model to emit the source tokens to be replaced seems to improve things.

https://github.com/day50-dev/sidechat/blob/db9c8f9d834967442...

It will often chomp white space differently but the main problem is

1. Track alignment with the lines being tracks (hash fixes that)

2. Content alignment with the model not losing focus (hamming/levenshtein other similarity scores fixes that)

If we demand exact matches we're simply not going to get them.

(Combining both methods might be good, I hadn't thought of that)

Another crucial point: the error line "Content mismatch. Reread the file" is crucial. Errors should give descriptive remediate actions.

So even with crappy models it does this automatically and will tool loop accordingly.

Asking it to do smaller edits is no good. Many smaller models will go down to single line edits, looking around for blank lines and just inject garbage. So don't suggest it.

Larger models, which succeed in doing this, know to do that. Smaller models which don't, won't do it if you don't suggest it

Seriously this thing works with 4B models

I also combine it with a toolcall hack for models that don't support tool calling

https://github.com/day50-dev/sidechat/blob/db9c8f9d834967442...

It injects the tool description in the system prompt after probing the capabilities and then does a simple response router.

I haven't found a model within reason that this doesn't work with (I'm sure if you intentionally throw some fine tune botch up that's emitting garbage it'll break - that's not the claim)

YMMV, works for me™