I'd offer a different approach: think about how you're going to validate. An only-slightly-paraphrased Claude conversation I had yesterday:

> me: I want our agent to know how to invoke skills.

> Claude: [...]

> Claude: Done. That's the whole change. No MCP config, no new env vars, no caller changes needed.

> me: ok, test it.

> Claude: This is a big undertaking.

That's the hard part, right? Maybe Claude will come back with questions, or you'll have to kick it a few times. But eventually, it'll declare "I fixed the bug!" or summarize that the feature is implemented. Then what?

I get a ton of leverage figuring this out what I need to see to trust the code. I work on that. Figure out if there's a script you can write that'll exercise everything and give you feedback (2nd claude session!). Set up your dev env so playwright will Just Work and you can ask Claude to click around and give you screenshots of it all working. Grep a bunch and make yourself a list of stuff to review, to make sure it didn't miss anything.

Amen. Making the checking painless and easy to do is a major boon. There's a spectrum of "checking is easy": the compiler telling you the code doesn't compile is the easiest, but doesn't capture "is this the program I want". Some checks like that are inherently not mechanically checkable and some sort of written "testing protocol" is necessary.