I do really like the Unix approach Claude Code takes, because it makes it really easy to create other Unix-like tools and have Claude use them with basically no integration overhead. Just give it the man page for your tool and it'll use it adeptly with no MCP or custom tool definition nonsense. I built a tool that lets Claude use the browser and Claude never has an issue using it.

The light switch moment for me is when I realized I can tell claude to use linters instead of telling it to look for problems itself. The later generally works but having it call tools is way more efficient. I didn't even tell it what linters to use, I asked it for suggestions and it gave me about a dozen of suggestions, I installed them and it started using them without further instruction.

I had tried coding with ChatGPT a year or so ago and the effort needed to get anything useful out of it greatly exceeded any benifit, so I went into CC with low expectations, but have been blown away.

As an extension of this idea: for some tasks, rather than asking Claude Code to do a thing, you can often get better results from asking Claude Code to write and run a script to do the thing.

Example: read this log file and extract XYZ from it and show me a table of the results. Instead of having the agent read in the whole log file into the context and try to process it with raw LLM attention, you can get it to read in a sample and then write a script to process the whole thing. This works particularly well when you want to do something with math, like compute a mean or a median. LLMs are bad at doing math on their own, and good at writing scripts to do math for them.

A lot of interesting techniques become possible when you have an agent that can write quick scripts or CLI tools for you, on the fly, and run them as well.

It's a bit annoying that you have to tell it to do it, though. Humans (or at least programmers) "build the tools to solve the problem" so intuitively and automatically when the problem starts to "feel hard", that it doesn't often occur to the average programmer that LLMs don't think like this.

When you tell an LLM to check the code for errors, the LLM could simply "realize" that the problem is complex enough to warrant building [or finding+configuring] an appropriate tool to solve the problem, and so start doing that... but instead, even for the hardest problems, the LLM will try to brute-force a solution just by "staring at the code really hard."

(To quote a certain cartoon squirrel, "that trick never works!" And to paraphrase the LLM's predictable response, "this time for sure!")

As the other commenter said, these days Claude Code often does actually reach for a script on its own, or for simpler tasks it will do a bash incantation with grep and sed.

That is for tasks where a programmatic script solution is a good idea though. I don't think your example of "check the code for errors" really falls in that category - how would you write a script to do that? "Staring at the code really hard" to catch errors that could never have been caught with any static analysis tool is actually where an LLM really shines! Unless by "check for errors" you just meant "run a static analysis tool", in which case sure, it should run the linter or typechecker or whatever.

Running “the” existing configured linter (or what-have-you) is the easy problem. The interesting question is whether the LLM would decide of its own volition to add a linter to a project that doesn’t have one; and where the invoking user potentially doesn’t even know that linting is a thing, and certainly didn’t ask the LLM to do anything to the project workflow, only to solve the immediate problem of proving that a certain code file is syntactically valid / “not broken” / etc.

After all, solving an immediate problem that seems like it could come up again, by “taking the opportunity” to solve the problem from now on by introducing workflow automation to solve the problem, is what an experienced human engineer would likely do in such a situation (if they aren’t pressed for time.)

I've had multiple cases where it will rather write a script to test a thing than actually adding a damn unit test for it :)

> Humans (or at least programmers) "build the tools to solve the problem" so intuitively and automatically when the problem starts to "feel hard", that it doesn't often occur to the average programmer that LLMs don't think like this.

Hmm. My experience of "the average programmer" doesn't look like yours and looks more like the LLM :/

I'm constantly flabbergasted as to how way too many devs fumble through digging into logs or extracting information or what have you because it simply doesn't occur to them that tools can be composed together.

> Humans (or at least programmers) "build the tools to solve the problem" so intuitively and automatically

From my experience, only a few rare devs do this. Most will stick with (broken/wrong) GUI tools they know made by others, by convenience.

I have the opposite experience.

I used claude to translate my application and I asked him to translate each text in the application to his best abilities.

That worked great for one view, but when I asked him to translate the rest of the application in the same fashion he got lazy and started to write a script to substitute some words instead of actually translating sentences.

Cursor likes to create one-off scripts, yesterday it filled a folder with 10 of them until it figured out a bug. All the while I was thinking - will it remember to delete the scripts or is it going to spam me like that?

>It's a bit annoying that you have to tell it to do it, though.

https://www.youtube.com/watch?v=kBLkX2VaQs4

Cursor does this for me already all the time though, give that another shot maybe. For refactoring tasks in particular; it uses regex to find interesting locations , and the other day after maybe 10 of slow "ok now let me update this file... ok now let me update this file..." it suddenly paused, looked at the pattern so far, and then decided to write a python script to do the refactoring & executed it. For some reason it considered its work done even though the files didn't even pass linters but thats' polish.

+1, cursor and Claude code do this automatically for me. Take a big analysis task and they’ll write python scripts to find the needles in the haystacks that I’m looking through

Yeah, I had Cursor refactor a large TypeScript file today and it used a script to do it. I was impressed.

[deleted]

Codex is a lot better at this. It will even try this on its own sometimes. It also has much better sandboxing (which means it needs approvals far less often), which makes this much faster.

Same here, I have a SQLite db that I have let it look over and extract data. I let it build the scripts then I run them as they would timeout if not and I don't want Claude sitting waiting for 30 min. So I do all the data investigations with Claude as a expert who can traverse the data much faster then me.

I've noticed Claude doing this for most tasks without even asking it to. Maybe a recent thing?

Yes. But not always. It's better if you add a line somewhere reminding it.

The lightbulb moment for me was to have it make me a smoke test and to tell to run the test and fix issues (with the code it generated) until it passes. iterate over all features in the Todo.md (that I asked it to make). Claude code will go off and do stuff for I dunno, hours?, while I work on something else.

Hours? Not in my experience. It will do a handful of tasks then say “Great! I’ve finished a block of tasks” and stop. and honestly, you’re gonna want to check its work periodically. You can’t even trust it to run litters and unit test reliably. I’ve lost count of how many times it’s skipped pre-commit checks or committed code with failing tests because it just gives up.

I once had the Gemini CLI get into a loop of failures followed by self-flagellation where it ended saying something like "I'm sorry I have failed you, you should go and find someone capable of helping you."

I saw on X someone posted a screenshot where Gemini got depressed after repeated failure, apologized and actually uninstalled itself. Honorable seppuku.

genius i gotta try this

I have a Just task that runs linters (ruff and pyright, in my case), formatter, tests and pre-commit hooks, and have Claude run it every time it thinks it's done with a change. It's good enough that when the checks pass, it's usually complete.

(I code mostly in Go)

I have a `task build` command that runs linters, tests and builds the project. All the commands have verbosity tuned down to minimum to not waste context on useless crap.

Claude remembers to do it pretty well. I have it in my global CLAUDE.md sot I guess it has more weight? Dunno.

A tip for everyone doing this: pipe the linters' stdout to /dev/null to save on tokens.

Why? The agent needs the error messages from the linters to know what to do.

If you're running linters for formatting etc, just get the agent to run them on autocorrect and it doesn't need to know the status as urgently.

That's just one part of it. I want the LLM to see type checking errors, failing test outputs, etc.

Errors shouldn’t be on stdout ;)

“Errors” printed by your linter aren’t errors, they’re reports

This is the best way to approach it but if I had a dollar for each time Claude ran “—no-verify” on the git commits it was doing I’d have 10’s of dollars.

Doesn’t matter if you tell it multiple times in CLAUDE.md to not skip checks, it will eventually just skip them so it can commit. It’s infuriating.

I hope that as CC evolves there is a better way to tell/force the model to do things like that (linters, formatters, unit/e2e tests, etc).

We should have a finish hook that, when the AI decides it's run, runs the hook, and gives it to the LLM, and it can decide whether the problem is still there.

Students don't get to choose whether to take the test, so why do we give AI the choice?

I’ve found the same issue and also with Rust sometimes skips tests if it thinks they’re taking too long to compile, and says it’s unnecessary because it knows they’ll pass.

Even AI understands it's Friday. Just push to to production and go home for the weekend.

a wrapper script?

How is this better than calling `cargo clippy` or similar commands yourself?

Claude can then proceed to fix the issues for you

Presumably cargo clippy --fix was the intention. Not all things are fixable, though, which is where LLMs are reasonable for -- the squishy hard-to-autofix things.

I updated this thing that searches manpages better recently for the LLM era:

https://github.com/day50-dev/Mansnip

wrapping this in an STDIO mcp is probably a smart move.

I should just api-ify the code and include the server in the pip. How hard could this possibly be...

Definitely searched apt on Debian before I installed the pip pkg. On a somewhat related note, I also thought something broke when `uv tool install mansnip` didn't work.

Thanks I'll get on both of those. It's a minor project but I should make it work

You know, I have heard some countries are making mansnip illegal these days

How does Claude Code use the browser in your script/tool? I've always wanted to control my existing Safari session windows rather than a Chrome or a separate/new Chrome instance.

Most browsers these days expose a control API (like ChromeDevtools Protocol MCP [1]) that open up a socket API and can take in json instructions for bidirectional communication. Chrome is the gold standard here but both Safari and Firefox have their own driver.

For you existing browser session you'd have to start it already with open socket connection as by default that's not enabled but once you do the server should able to find an open local socket and connect to it and execute controls.

worth nothing that this "control browser" hype is quite deceiving and it doesn't really work well imo because LLMs still suck at understanding the DOM so you need various tricks to optimize for that so I would take OP's claims with a giant bag of salt.

Also these automations are really easy to identify and block as they are not organic inputs so the actual use is very limited.

- https://github.com/ChromeDevTools/chrome-devtools-mcp/

It's extremely handy too! If you try to use web automation tools like selenium or playwright on a website that blocks them, starting chrome browser with the debug port is a great way to get past Cloudflare's "human detector" before kicking off your automation. It's still a pain in the ass but at least it works and it's only once per session

Note that while --remote-debugging-port itself cannot be discovered by cloudflare once you attach a client to it that can be detected as Chrome itself changes it's runtime to accomodate the connection even if you don't issue any automation commands. You need to patch the entire browser to avoid these detection methods and that's why there are so many web scraping/automation SAAS out there with their own browser versions as that's the only way to automate the web these days. You can't just connect to a consumer browser and automate undetected.

True, it fails to get past the Cloudflare check if my playwright script is connected to the browser. But since these checks only happen on first visit to the site I'm ok with that.

Isn't this what SeleniumBase does?

It will navigate and know how to fill out forms?