This doesn't work. The model outputs the most probable tokens. Running it again and asking for less probable tokens just results in the same but with more errors.

Do you not have experience with agents solving problems? They already successfully do this. They try different things until they get a solution.