The test harness point is the one that really sticks for me too. We've been using agentic loops for browser automation work, and the domain has a natural validation signal: either the browser session behaves the way a real user would, or it doesn't. That binary feedback closes the loop really cleanly.

The tricky part in our case is that "behaves correctly" has two layers - functional (did it navigate correctly?) and behavioral (does it look human to detection systems?). Agents are fine with the first layer but have no intuition for the second. Injecting behavioral validation into the loop was the thing that actually made it useful.

The .md scratch pad between sessions is underrated. We ended up formalizing it into a short decisions log - not a summary of what happened, just the non-obvious choices and why. The difference between "we tried X" and "we tried X, it failed because Y, so we use Z instead" is huge for the next session.

What are you developing that technology for?

browser automation at scale - specifically the problem of running many isolated browser sessions that each look like distinct, real users to detection systems. the behavioral validation layer I mentioned is the part that makes agentic loops actually useful for this: the agent needs to know not just "did the task succeed" but "did it succeed without triggering signals that would get the session flagged".

the interesting engineering problem is that the two feedback loops run on different timescales - functional feedback is immediate (did the click work?) but behavioral feedback is lagged and probabilistic (the session might get flagged 10 requests from now based on something that happened 5 requests ago). teaching an agent to reason about that second loop is the unsolved part.

so spam?

fair question. i shared a technical experience because it was directly relevant to the test harness discussion - the behavioral vs functional validation layers, the lagged feedback problem. if that reads as promotion, i get it, but it wasn't the intent. the engineering problem is real regardless of who's solving it.

They weren't saying your _post_ was spam. They're saying you build tools for spammers.

Because that's what they'll be used for.

that's a fair concern to raise. any tool that helps browsers look more human can be misused.

the actual use cases we see are mostly legitimate automation - QA teams testing geo-specific flows, price monitoring, research pipelines that need to run at scale without getting rate-limited on the first request. the same problem space as curl-impersonate or playwright-extra, just at the session management layer.

could someone use it for spam? technically yes, same as they could with any headless browser setup. but spam operations generally don't need sophisticated fingerprinting - they're volume plays that work fine with basic tools. the people who need real browser isolation are usually the ones doing something that has a legitimate reason to look human.