These are all amazing ideas. We actually already see a lot of solo devs using mrge precisely because they want something to catch bugs before code goes live—they simply don't have another pair of eyes.
And I absolutely love your idea of having multiple AI models review PRs simultaneously. Benchmarking LLMs can be notoriously tricky, so a "wisdom of the crowds" approach across a large user base could genuinely help identify which models perform best for specific codebases or even languages. We could even imagine certain models emerging as specialists for particular types of issues.
Really appreciate these suggestions!