I stopped manually writing code 6-9mo ago, and am generating high-quality code on the dimensions we care about like GPU perf benchmarks, internal & industry conformance standards test suites, evals benchmarks, lint/type checkers, etc. It's not perfect code - there are clear AI slop tell tales that review cycles still let linger - but it's doing more ambitious things than we'd do on most dimensions like capability, quality, and volume. We're solving years-old GPU bugs that we had given up on as mere mortals.
And yes, we build our own orchestrator tech, both as our product (not vibes coding but vibes investigating), and more relevant here, our internal tooling. For example, otel & evals increasingly drive our AI coding loops rather than people. Codex and claude code are great agentic coding harnesses, so our 'custom orchestration' work is more about more intelligently using them in richer pipelines, like the above eval-driven loop. They've been pretty steadily adding features like parallel subagents that work in teams, and hookable enough to do most tricks, that I don't feel the need to use others. We're busy enough adapting on our own!