Yes, I wrote a forge skill to do this via a/b testing and third agent to judge the result.

https://github.com/bjcoombs/ai-native-toolkit/blob/main/skil...

It hardens a skill through judge-panel refinement rounds, it’s a quality gate that runs after authoring, not an authoring tool.

This is a pretty neat, I suspect that eventually every skill will have some sort of validation/verification loop like this