> I’ve often heard, with decent reason, an LLM compared to a junior colleague.

No, they're like an extremely experienced and knowledgeable senior colleague – who drinks heavily on the job. Overconfident, forgetful, sloppy, easily distracted. But you can hire so many of them, so cheaply, and they don't get mad when you fire them!

These metaphors all suck. Well, ok, yours is funny. But anyway, LLMs are just very different from any human.

They are extremely shallow, even compared to a junior developer. But extremely broad, even compared to the most experienced developer. They type real fuckin fast compared to anyone on earth, but they need to be told what to do much more carefully than anyone on earth.

I've gotten Claude Code to make CUDA kernels and all kinds of advanced stuff that there's zero percent chance a junior would pull off.

AI is like a super advanced senior wearing a blindfold. It knows almost everything, it's super fast, and it gets confused pretty quickly about things after you tell it.

it's not a senior though, because of the amount of oversight and guidance required. I trust senior-plus human developers to do the right thing and understand why it's the right thing. For mission critical things I get another human senior to verify. There's no way I'd autonomously trust 2, 10 or any number of LLMs to do the same thing.

You'd be surprised at what juniors can pull off. I have seen fresh-out-of-college new grads write performant GPU kernels that are used in real world library implementations for particular architectures.

Of course they can. It doesn't take that long to learn CUDA/etc and hardware details, read through some manuals and performance guides, do some small projects to solidify that knowledge. What you need is talent, and some months. That's why at university I saw plenty of students pull off amazing projects, and there's nothing eyebrow-raising in the slightest about starting a PhD a year after getting a bachelor's and writing software more sophisticated than most programmers would write in their career.

I think the programming profession overvalues experience over skill. However, when I was young I had no appreciation for what the benefits of experience are... including not writing terrible code.

Most of the juniors I've worked with would make numerical errors and give up/declare premature victory before getting the implementation to a robust state. I'm sure there are exceptional young folks out there though.

  > Most of the juniors
Most senior programmers can't write CUDA kernels either. Even fewer can write ones that are any good.

What's your sample size?

And today I asked Claude code to carefully look at and follow the structure of my tests and write some more, and after 5-10 mins it completely ignored it, wrote one class with all the helper methods and a bunch of compilation errors from fields that arent even in my model.

I figured there's probably a ton more logical issues and deleted it immediately

have you ever asked a junior developer to write a cuda kernel?

I've asked juniors to write easier things without success, and I applied the transitive property.

[deleted]

> but they need to be told what to do much more carefully than anyone on earth.

have you ever managed an offshore team. holy cow

"Offshore team." "Holy cow." I see what you did there.

that was accidental

I just spent a good 2 hours trying to debug a SM6 Vulkan issue with unreal engine using an LLM, it had got me to good state but UE kept falling to load a project, it transpired that the specific error message would provide a fix as the top Google result, which I found when I eventually decided to look for myself.

LLM did help a lot to get some busy work out of the way, but it's difficult to know when you need to jump out of the LLM loop and go old skool.

Fwiw I think the ratio of times I needed to go to google for a solution instead of an LLM is like 20:1 for me so your mileage may vary. Depends a lot on the specific niche you're working in.

Unrelated to software but recently I wanted to revive an old dumbphone I haven't used since 2014 and apparently I had it password protected and forgot the password and wanted to factory reset it. I found the exact model of the phone and google had only content farm articles that didn't help me at all but Gemini gave me the perfect solution first try. I went to google first because I had no faith in Gemini since to me it seemed like a pretty obscure question but guess I was wrong.

In the interest of full disclosure my setup is quite esoteric for unreal dev. Linux and nixos no less. To be honest I'd probably have given up on nixos long ago without LLM support. It's actually really quite handy to be able to share a declarative specification of my environment.

Google search has been enshittified. Kagi is where you get real search results now.

Unsure why this is downvoted. "Google search has been enshittified" should be a common sentiment here.

And they are incessantly cheerful.

I asked Claude to design me a UI, and it made a lovely one... but I wanted a web ui. It very happily through away all its work and made a brand new web UI.

I can't imagine any employee being that quick to just move on after something like that.

I think we had the analogy right with "fancy autocomplete". Sometimes, the completions are excellent and do exactly what we want, other times the completions fail to match our expectations and need human attention. It's a tool.

Experienced and knowledgeable and they also believe in the technical equivalent of flat-Earthism in many, many non-trivial corners.

And if you push back on that insanity, they'll smile and nod and agree with you and in the next sentence, go right back to pushing that nonsense.

One good yardstick, if one has to anthropomorphize, is that LLMs know and believe what's popular. If you ask it to do something that popular developer opinion gets right, it will do fine.

Ask it for things that many people get wrong or just do badly, or can be mistakenly likened to a popular thing in a way that produces a wrong result, and it'll often err.

The trick is having an awareness of what correct solutions are prevalent in training data and what the bulk of accessible code used for training probably doesn't have many examples of. And this experience is hard to substitute for.

Juniors therefore use LLMs in a bumbling fashion and are productive either by sheer luck, or because they're more likely to ask for common things and so stay in a lane with the model.

A senior developer who develops a good intuition for when the tool is worth using and when not can be really efficient. Some senior developers however either overestimate or underestimate the tool based on wrong expectations and become really inefficient with them.

“Here’s the code:” looks totally plausible but hallucinated libFakeCall.

“libFakeCall doesn’t exist. Use libRealCall instead of libFakeCall.”

“You’re absolutely correct. I apologize for blah blah blah blah. Here’s the updated code with libRealCall instead. :[…]”

“You just replaced the libFakeCall reference with libRealCall but didn’t update the calls themselves. Re-write it and cite the docs. “

“Sorry about the confusion! I’ve found these calls in the libRealCall docs. Here’s the new code and links to the docs.”

“That’s the same code but with links to the libRealCall docs landing page.”

“You’re absolutely correct.” It appears that these calls belong to another library with that functionality:” looks totally plausible but hallucinated libFakeCall.

You forgot to include it's best excuse for why the documentation it cites is hallucinated: "The links may be broken/website is down".

For all the hot air I hear about the user having to give the system the proper context to give you good answers - does anyone claim to have a solution for dealing with such a belligerent approach to bullshit?

They are by no means useless, but once they fall into that hole, there's no further value in interrogating them.

No, they're like a completely nontechnical marketing person who has a big library of papers on related subjects and who's been asked to generate a whitepaper by pulling phrases. What comes out will probably have proper grammar and seem perfectly reasonable to another person with no knowledge of the field, but if actually read by someone knowledgeable may be complete gibberish taken as a whole.

Individual sentences and paragraphs may mostly work, but it's an edifice built on sand out of poorly constructed bricks plus mortar with the wrong proportions (or entirely wrong ingredients).

LLM output is "truthy" - it looks like it might be true, and sometimes it will even be accurate (see also, "stopped clock") but depending on it is foolish because what's generating the output doesn't actually understand what it's putting out - it's just generating output that looks like the kind of thing you've requested.

Overconfident, but also easy to sway their opinion.

They follow directions for maybe an hour and then go off and fix random shit because they forgot what their main task was.

They'll tell you to your face how great your ideas were, and that you're absolutely right about something, then go implement it completely incorrectly.

They add comments to literally everything even when you ask them to stop. But they also ignore said comments sometimes.

Maybe they are kinda like us lol.

> who drinks heavily on the job

Yes, that's why you should add "... and answer in the style of a drunkard" to every prompt. Makes it easier to not forget what you are dealing with.

My analogy is a cursed monkey paw with unlimited wishes. It's actually really really powerful but you have to be careful of leaving any possible ambiguity in your wishes.

Like a person that has memorized every single answer in the world to every single question ever asked - but is completely dead behind the eyes.

> Overconfident, forgetful, sloppy, easily distracted

And constantly microdosing, sometimes a bit too much.

And sometimes it seems like a macro-dose huffing xylene-based glue.

> and they don't get mad when you fire them!

No, typically __you__ are mad when you fire them ...

If you’re making hire/fire decisions while emotional, you’re doing it wrong.

Whoosh

“You’re absolutely right!”

Yes. Funny

But it is nothing like a wetware colleague. It is a machine.