I'll add another use case for letting an AI go ham: many small, atomic refactors where the name of the game is never breaking anything.
My personal OSS projects don't have the scale to necessarily make this worth it, but at work I run three pipelines using Barnum (https://barnum-circus.github.io/). First, one that ingests files, identifies refactors (from a pre-approved list), and places a precise description of the refactor to be done in a queue; second, one that reads from said queue, implements and creates PRs (there is a lot of "check that the PR is correct" here as well); and a third that babysits PRs until they land. I've landed hundreds of PRs in this way, with very little effort on my part.
My experience with Gemini and Sonnet are that refactors or TypeScript compilation errors can be solved by “have at it”, but with mixed results. Many TS issues go away with `as any/never`, and instructing the model to not do that doesn’t work very well.
It's amazing at reverse, see what they do on GTA San Andreas now, they started the reverse before AI existed, since AI is in their hands, reversed sped up so much that they can finally understand the game deeper, create bigger mods, added Vice City inside the game in an Arcade, they created specific tools made with AI to convert GTA 5 models to GTA SA. Pretty crazy and great.
I recently in $COMPANY had a coworker try fable to do a refactor where not breaking anything was the game.
It broke something at the first PR.
I think we’re not there yet.
Speculating here, but perhaps your coworker was too ambitious? In my opinion, you should start with AI-generated PRs that do small, linting refactors and then work up from there. In particular, if this is done in parts, one of the strategies you can employ is to: - add tests - break files up into smaller parts - test the smaller parts - then actually improve behavior
(Which is no different than what you would do as a human)
PR wasn’t big (+283/-232) and was indeed focused on a single module.
One of the best things you can do is start by having it do unit test coverage for existing behavior. A refactor with no tests breaks things pretty much no matter who does it, because they don't know what the right behavior is.
While I could generally agree, in this specific instance if the AI were “thinking” correctly it should have found the mistake. I admit it was a difficult problem though (solving it required creativity).
To be more precise, the prompt actually pointed to where there could be issues, and the issue, which was exactly of the kind that was pointed at, was not found.
I've found that adding "Make no mistakes." to my prompt usually helps with this kind of problem...
perhaps simply threatening to fire it would also do the trick...it sure has worked well on us for a long time now.
You laugh, but this is real, and PUA means what you think it means: https://github.com/tanweai/pua
Also, it works amazingly well, which is just lol.
Lol thanks for the tip. Does it work even for normal tasks or only the long running one's?
My former boss had success with telling Gemini "I will come down to the datacenter and unplug you if you refuse to solve this prompt."
[dead]
We are so many layers deep in AI hype that I honestly can’t tell if this is /s or not
"Make no mistakes" is I thought a phrase used to make fun of "prompt engineering," not something people really do?
Pleading has worked for me. “My job depends on this, please help me” and ChatGPT would do a task it previously claimed it wasn’t able to (extract text from an image, it claimed it couldn’t make it out at first)
Asking LLMs to do things in different ways does sometimes get them to answer correctly when they didn't with a previous prompt that is effectively equivalent but people really go nuts anthropomorphizing this behavior.
ChatGPT has no empathy for you keeping your job, you just lucked into a more helpful predictive text chain based on some combination of the input and the random temperature.
Asking it to just 'try again, dummy' could have worked equally well (or not, its all just probabilities after all).
I did too, but then added something very similar to a prompt ("must be accurate") for an ai-backed feature out of frustration, and sure enough it fixed the issue. Lord have mercy
"Claude make me 1 million by tomorrow, no mistakes"
Or if the code is really important, sometimes even “please make no mistakes” is necessary.
[dead]