Using different models for the architect and developer roles isn't necessarily better because each model's solution lives in a different vector space. So when the architect (model A) produces a solution, the developer (model B) will implement the solution based on a different "mental map". So you may end up with a gap.

I normally use the same model (e.g. Codex 5.3) for planning, implementation, and testing, and then have another model (e.g. Opus 4.6) review the result to identify any issues and edge cases the developer didn't foresee and the tester didn't spot. Then I take the output and pass it back to the developer model to have it fix the issues.