Personally, I find it fascinating to watch how, whenever a new technology appears, people start competing to define and own its standards.

Manus rebuilt its harness five times in six months. The model stayed the same, but the architecture changed five times.

LangChain re-architected Deep Research four times in one year.

Anthropic also ripped out Claude Code’s agent harness whenever the model improved.

Ever since Mitchell Hashimoto mentioned the harness in February, people have been trying to claim that concept. Eventually, someone will probably sell a book called Harness Engineering. I will buy it, of course. Then I will write a blog post about it that nobody reads, with a link that will be buried under ShowDead as soon as I submit it to HN.

And by that point, IT companies will start asking:

“You’re a new grad, right? You know harness engineering, don’t you?”

Author here.

In my opinion, the main driver here is how fast models have evolved in the past 12 months. It makes the architecture of everything around them obsolete, very fast.

We went from using models as a building block, wrapping them in heavy workflow code, to now models being smart enough to drive their own workflows and planning.

Really enjoyed your post, by the way. The idea of putting skills and memories in a database while keeping the file shaped interface for the agent is clean. One read/write surface, two backends, invisible to the modle that's a nice piece of design, and the candor in the "what's still hard" section made me trust the rest of the post. My comment above was meant as a joke, not about your architecture. If this pattern becomes the standard, I'll happily migrate my workflow again.

One thing I wonder about is whether path routing alone is enough.

If `/workspace` goes to the sandbox and `/memory` or `/skills` goes to the database, the path tells you where to send the request. But it does not tell you whether this user, session, or agent is allowed to access it.

When I built something similar with an MCP filesystem, I found that I needed a scope check before actually running the operation. In my case, I was using GPT dev mode through a Cloudflare tunnel to control my local environment/model, so this kind of boundary became important.

So I like the path-routing idea, but I wonder if it eventually needs a scope or permission layer as well.

> putting skills and memories in a database

I assume by database he meant a relational database. But I don't see the advantage of that over just having skills and memory it in our source control database. Am I missing something?

Thank you, appreciate it!

Regarding scoping: In our case, the agent loop runs in the same way as our API server does (as in, it’s a multi tenant service running in a container somewhere). And we solve scoping in the same way.

To put it in other words, whether it’s the API receiving “GET /memories/id” or the LLM requesting “Read(/memories/id)” we do pretty much the same thing (check authN/authZ, scope the db request, etc).

Basically the LLM is just another API client using a slightly different format for inputs and outputs, but sharing the same permission layer.

Just wait 6 months for something new to come up and everyone will forget about harnesses.

> Ever since Mitchell Hashimoto mentioned the harness in February

What. The idea is as old as anyone can remember, and wrt. LLMs, it was known to be important since at least as early as ChatGPT being first released.

Yes, the concept itself is not new. Around 2022, people would usually have called it the orchestration layer.

But I think the term started being used closer to its current meaning around this point:

https://www.softwareimprovementgroup.com/blog/what-is-harnes...

In a way, the sequence was something like:

prompt engineering(23~4) -> context engineering(25) ->harness engineering(26)

At first, it was mostly understood as a correction or extension of prompt engineering. But the idea of “harness” as the layer that corrects, constrains, and operationalizes agents seems to have emerged much more clearly around 2026.

So yes, there is definitely some terminological confusion in the early phase. That is normal. New technical fields often begin with several competing names for almost the same layer, and only later does one term become stable.

My 2c:

The word harness brings the truth of LLMs back down to Earth.

it really felt like between 2018 and 2022ish like LLMs had this magical aura, like the orchestration layer was intelligent, maybe even recursive, beyond what simple functions could do. It was assumed that this was a solved problem. The word "orchestration" denoted it, the words we used were full of optimism. When you lift the veil, it really is just regex, and cool tricks sure, but it's a harness it's a utility, there's no magic here, there's realism.

Maybe the labs even had a part to play in this as well; attempting to make themselves look magical. I mean just look at the choice of name for "Mythos", it's about bringing back that feeling of myth and magic after we saw under the veil.

The reality is that the labs have produced magical models yes, but are locking them into ecosystems that leave a lot to be desired, and are easily reproducible, and essentially are cron jobs, regex.. things we've seen in traditional cloud for decades. It feels like an attempt to create a moat where there is none.

Maybe I'm wrong but this has been my impression

There were no LLMs between 2018 and 2022, at least not in the sense resembling today. The whole LLM frenzy started in late 2022.

Harness itself was a widely used term by at least the "[LLM] plays pokemon" trend, which was a year ago[1]. That was basically the term of art to use when arguing about just how much special treatment LLMs should get.

"harness engineering" is the term claimed by that article to have originated in February. It does seem obvious in retrospect and I don't remember an origination point, but there's at least one hn comment predating that in December[2] and it doesn't treat it as novel.

I will admit that my bias is against any self congratulatory buzzword fads (I'm still not over "MCP is the USB of LLMs" or whatever and that's been a year now too). "Who coined the term harness engineering?" -> who cares? It was already widely being done.

[1] https://www.lesswrong.com/posts/7mqp8uRnnPdbBzJZE/is-gemini-...

[2] https://news.ycombinator.com/item?id=46331242

I read your comment. I think we may be talking about slightly different contexts.

The Pokémon article you linked is basically about benchmarking. In that context, the harness functions as part of the benchmark setup: the controlled environment around the model, the available inputs, tools, and assistance.

The current usage of “harness,” at least in the agent engineering discussion, seems closer to a lower-level runtime layer, almost like an OS around the agent.

So I see this as a transition: from “harness” as a narrower benchmark/control-variable layer to “harness” as the broader operating environment of the agent.

That does not mean I think your point is wrong. With topics like this, the interpretation depends on which part of the lineage one emphasizes. The first appearance of the idea may go back to 2022 or earlier, while the usage that looks closer to the current meaning may have emerged at a different point.

I am probably giving more weight to the SIG article, while you are giving more weight to a different point in the lineage. Both seem reasonable to me.