Claude Code has a big system prompt, most of which isn't necessary for the more recent models. (Codex too.)
I've been running Claude and GPT in my own agent harness. The main difference I notice is that tasks take about 7x longer to complete if they're run in the official Claude or Codex harness (and cost me 7x more).
You would think this would lead to increased correctness, but that doesn't seem to be the case. Today I tested both side by side. They both resulted in data loss. (I had a backup obviously.)
GPT running in the official harness did a bunch of extra tests and double checking, and ended up with the same result regardless (it permanently deleted a bunch of documentation).
All else being equal, I like getting my data loss 7x faster and cheaper ;)
So you are using the API directly in your own harness without the subscription?
Yeah, I made a simple agent based on this tutorial:
https://minimal-agent.com/
It's slightly bigger now, but here's a ~50 line version for reference. I added the missing outer while-loop, so it takes user input etc.
https://gist.github.com/a-n-d-a-i/bd50aaa4bdb15f9a4cc8176ee3...
I mostly use it with GLM via their coding plan, I got a year for like $20 when it was on sale. But I also hooked it up to Sonnet, Opus, GPT, etc.