This has been my experience working on tsz.dev. Only Opus 4.7 and GPT 5.5 can really be productive for the remaining test cases.