So I understand how these prompts work for tooling, etc, but they tend to be specific to specific models. Is it possible you could actually supply say 10 prompts for the same tool and determine which one gets the correct output? It wouldn't be much harder than having some test cases and running each prompt through the user selected model to see which worked.

Otherwise you're at the mercy of whatever model the user has selected or downloaded or whatever. And whenever you need to tweak it to improve something.

This would be akin to how we used to calibrate stylus or touch screens.