> What provider do you use?
1. My own harness + Local (which usually means Qwen3.6-35B-A3B), I use this fairly often for research gathering on topics, info gathering on code bases, etc.
2. My own harness + DeepSeek v4 Flash served by DeepSeek, I added $20 quite some time ago and somehow still have $18.77 in there after I don't know how many prompts. I use this pretty often, slightly less than my local setup, it's great and what I'm planning on running locally (eventually).
3. My own harness + OpenRouter with whichever model I want to try out. I use this very rarely.
4. Pi + OpenAI Codex $20 subscription. I don't use this almost at all anymore, but I keep the Codex subscription for testing things out to see how GPT-5.5 will handle a problem the other setups have issues with.
> Why do you trust it with serving full quality?
The only thing I've noticed seems unbearably useless sometimes versus what I noticed before was GPT-5.5 which has had some of the weirdest degradations I've seen. It's not to Anthropic levels but it definitely had some service issues a few times where I was wondering if they had accidentally (or purposefully) lobotomized it.
Everything else has mostly just been the same, except DeepSeek I noticed had some speed issues a few days ago.
> What harness do you use? Why do you trust it not to have malware (most harnessed are TS apps)?
I pretty much only use my own, agents are trivial to make and it's definitely not hard to make one that's better than Claude Code or Codex for whatever you're doing.
I want to say that I agree with you on the value of writing your own coding harness. I wrote something simple in Emacs Lisp and it makes me happy occasionally using it. I am trying to learn Rust and I am working on my own Rust core orchestration layer and I plan on both a Rust command line client and I already have a Python library wrapper for the Rust code that I have written so far. I write a lot of ‘little books’ and I am almost sure to write yet another one on my current hacking project.
Are my little hacks as effective as OpenCode or Claude Code? No way, but I am learning a lot and having fun.
Do you write /maintain evals? This is something I want to get into more. Otherwise I feel really blind and feel compelled to just drop money on frontier.
Not really. I have a benchmark I made for fun where I let LLMs control a text editor called Kakoune, and then give them no other way to do things, to see how they deal with it, but that's not really a scenario I expect them to do well at.
So far most of them have done very poorly on that one, because they are all overtrained on just executing shell commands.
A former colleague of mine and I made a simple test for some baseline "Everything worth using should be able to do this pretty easily and swiftly" but that's some very minor code generation with a very straight forward, boilerplate-type pattern.