This just sounds 1:1 equivalent to "there are things LLMs are good for and things LLMs are bad for."

I'll bite.

What are those things that they are good for? And consistently so?

As someone who leans more towards the side of LLM-sceptiscism, I find Sonnet 4 quite useful for generating tests, provided I describe in enough detail how I want the tests to be structured and which cases should be tested. There's a lot of boilerplate code in tests and IMO because of that many developers make the mistake of DRYing out their test code so much that you can barely understand what is being tested anymore. With LLM test generation, I feel that this is no longer necessary.

Isn’t tests supposed to be premises (ensure initial state is correct), compute (run the code), and assertions (verify the result state and output). If your test code is complex, most of it should be moved into harness and helpers functions. Writing more complex code isn’t particularly useful.

I didn't say complex, I said long.

If you have complex objects and you're doing complex operations on them, then setup code can get rather long.