> Please consult me when you encounter any ambiguous edge cases

Why not check the logprobs of the output and take action when the prob of the first and second most likely token is too similar? (or below a certain threshold?

because this is manual? are you an llm?