Hacker News

AI is a junior to mid-level engineer. If you treat it as such, you get the best of both vibe coding and rigorous engineering without all this paranoia.

Since the very beginning I've ran Claude from an isolated VM on yolo mode. This is just like giving an engineer their own laptop. Claude works on a feature up to a PR worthy point. I review the diff, just like I would with another engineer, and massage it to get it in the right shape and move on.

Inexperienced engineers make the same mistakes described I've even seen rm -rf albeit not from root! I would have lost my mind micromanaging someone with all permissions denied.

jsdalton 16 hours ago [ - ]

I strongly agree with this take — and that’s partly why the article posted here leaves me scratching my head. PRs are already the gate, right? I don’t care what an agent does or doesn’t do within the confines of its workspace assuming their contributions are gated via a git repository and they don’t require exotic access to a production environment to do their development.

I’m also with you on the junior / mid-level engineer framing (a “brilliant” junior engineer perhaps, one who graduated from at the top of their class from the best CS program in the country) with a big caveat: AI is like a junior engineer who doesn’t know how to learn.

It’s like you’re working with the guy from Memento. Every day your LLM reports to work and they’ve learned nothing from your work so far. Every day is the first day!

Now like the Memento guy you can help them to scatter their workspace with sticky notes and reminders everywhere. With some effort you can start to approximate that thing called “learning” which is LITERALLY the most important trait of every single software developer on a team.

But I confess it’s a struggle for me and the available tooling isn’t there yet. The best I’ve done looks closer to the “second brain” people use tools like Obsidian for. Sadly I don’t think a second brain is a substitute for a first brain. And to be 100% honest any engineer who exhibited the same inability to learn and grow as an AI agent would be sacked after their first month on the job at any company I’ve ever worked at.

I’m actually reasonably optimistic that either the main AI providers or someone else will improve on this in the coming years. It certainly feels like a decent memory paired with a well architected thinking system that’s better at contextually injecting memories (I find LLMs today don’t know what they don’t know unless you force them to put metaphorical sticky notes all over the place) as well as capturing real learnings without supervision shouldn’t be an impossible task requiring novel technical structures.

Anyhow I’d love to be wrong about some of the above and I’m always reading articles like this one hoping that someone has solved these problems already and that I’m just slow on the uptake. But as of today, I’m only modestly better at architecting such agents than I was when I started.

iainmerrick 13 hours ago [ - ]

Yep, this is my experience too. I think of it more as a very, very smart and fast intern -- you can tell it’s going places, and in many ways is already way better than you, but it still needs an experienced hand to steer it.

My rule of thumb is, any special processes you put in place for AIs are either sensible for humans as well, or they’re not worthwhile. Good CLIs, auto-summarization of long command outputs, Markdown docs and workflows -- those are all useful for people too!

To guard against mistakes and abuse, you use sandboxing and scoped permissions, not micromanagement.

One thing I’d like to figure out is a good pair-programming workflow for AI agents. You can tell a high-level model to go and do something, and that works; you can use a low-level model as an IDE assistant, and that works; but they’re separate workflows. What would be really useful is a way to kind of hand the keyboard back and forth with the high-end model and build something together. But safely, not in full-on YOLO mode on my own machine. This is one specific area where humans and LLMs differ -- it’s so much faster than me that I can’t just grab the keyboard back from it if it goes off the rails.

andai 11 hours ago [ - ]

And if you give Claude an actual laptop, he can fix the Linux bluetooth audio issues ;)

nqzero 21 hours ago [ - ]

what VM/provisioning are you using ?

fny 20 hours ago [ - ]

For work, EC2. For play, the cheapest VM I could find: https://vpshostingservice.co/

They have specials every now and then.

bpodgursky 18 hours ago [ - ]

> AI is a junior to mid-level engineer

This is not true anymore and you aren't helping yourself by deluding yourself about it.

It's something, nobody quite knows what, but it's NOT a junior or mid level engineer, it's a nuclear powered staff engineer living in a cardboard box who lacks domain context and wakes up with no memories ever 5 hours.

hansvm 17 hours ago [ - ]

And who can't code its way out of a wet paper bag on hard problems. It's more productive for the day-to-day BS, which is convenient because it creates more day-to-day BS you need to handle, but that isn't the reason I hire a staff engineer.

bpodgursky 17 hours ago [ - ]

i'm sorry but you're wrong and the only person you're hurting with your delusions is yourself. it doesn't change reality to pretend the world isn't changing under your feet.

i'm not going to argue about this but for your own career etc i truly hope you evaluate your epistemics.

hansvm 17 hours ago [ - ]

Sure, it's changing, and I use AI a ton. The second I ask it (where "it" is a smattering of all the SOTA models and harnesses) to do something as simple as design a server capable of doing <moderately simple task> when any concurrent data structures are involved and the single-server load is in the 100k QPS range, even with extremely thorough plans of how concurrency needs to be managed, it doesn't matter how little code is actually needed or how easy it would be for my juniors to bang out the problem, especially with a little AI boost, AI just can't keep up by itself yet. It can sometimes spit out something close, but only with major correctness issues.

I'm not trying to be argumentative; You posed an idea, and it looked wrong in an important way, so I added my observations. I'd love if you could share the model/harness/workflow you use that makes you so confident in this tooling, because I don't want to be left behind.