> They're probably going to cross that boundary soon

How? There’s no understanding, just output of highly probable text suggestions which sometimes coincides with correct text suggestions.

Correctness exists only in the understanding of humans.

In the case of writing to tests there are infinite ways to have green tests and break things anyway.

> How?

The typical approach is to prompt an LLM model with an outline of what the problem is and let it write the code itself by giving it file (+ maybe some other things) access. You could look into software packages like Windsurf (as original post) or the Cline extension for VS Code which are both pretty good at this sort of thing.

They perform at what I'd estimate is a mid programmer's level right now and are rapidly improving in quality.