> then just randomly mock a function like Gemini used to

Claude Code does that on longer tasks.

Time to give Codex a try I guess.