You could also parse prompts into an AST, run inference, run evals, then optimise the prompts with something like a genetic algorithm.