This is not really true. It can become very difficult for mathematicians to review the work of their peers. Many times, mistake also evade the scrupulous verifications and the test of time/of many is the best test.

Why would you not have the bot write in a formal language (e.g. Lean) and then just typecheck it? Then you only need to decide that your definitions were interesting. LLMs are now extremely good at programming given a compiler/typechecker, so I'd expect them to be good at formal math as well. It's nearly (if not precisely) the exact same activity.

That’s what they do usually I understand - llm generates proof in lean, and proof checker proves.