Not really. I have one I made for fun where I let LLMs control a text editor called Kakoune, and then give them no other way to do things, to see how they deal with it, but that's not really a scenario I expect them to do well at.
So far most of them have done very poorly on that one, because they are all overtrained on just executing shell commands.
A former colleague of mine and I made a simple test for some baseline "Everything worth using should be able to do this pretty easily and swiftly" but that's some very minor code generation with a very straight forward, boilerplate-type pattern.