30-120 seconds sounds surprisingly long for ~368 attempts, do you know which part(s) the slowness comes from?

From doing MR rounds in pure Python: https://github.com/textonly/git-prime/blob/main/git-prime-co....

Should be under 5 seconds in C or C++ using gmp

No, MR in pure python is ~instantaneous for numbers of this magnitude.

From looking at the code, the overhead will be from repeatedly invoking git as a subprocess.

Have not flame graphed or even really considered optimization