>I would love to have a product sheet showing what each models strengths an weaknesses are, so that I can have a clear decision tree of "if this kind of work, use model X", or "model Y should be used in ways Z". But they all look the same from the outside and the only way to figure out which might be marginally better at what is to do extensive, time consuming, and perhaps expensive testing.

Think of it less like a static tool, and more like a human helper, where the same holds.

Well, unlike a human, I cannot expect any these LLMs to take any ownership of the work they do. I cannot expect any given model and version (sonnet 4.6) to learn, improve and adapt over time. I cannot expect it's limitations to ever go away at the model level. So it is not like a human in most ways that I actually care about.

That said, I can't wait for LLMs to stop being AI and start being just another tool. Anything cursed with the "AI" label seems to go through this mess. In the earlier AI cycles, rules engines were considered "human-ish" and got hyped up, but today we just see then as just another tool available to us, and we're better off for it.

You're on the hook for their work in the way a manager is for their staff's output. The insistence of AI being a mere tool very often comes with this strange desire to be free of responsibility for its work. People seem to forget that the big advantage in these things is the range they have for obscure insight and creative solutions, both impossible with determinism.

That said, I can't wait for LLMs to stop being AI and start being just another tool.

From a horse's perspective, the internal combustion engine is just another tool for making scary noises and powering horse trailers to take me on fun horse adventures. So ... perhaps.

models don’t improve, but harnesses/tools/rules around them grow with the project.

One issue with that is that human helpers last longer. LLMs cycle in and out in months, and what held for Your Favorite LLM 6.7 may not hold for Your Favorite LLM 6.9.

Right, this is why I would slam the breaks on investing into your workflow all of your time and effort, because 2 months from now it may be out the window. Frontier models are also constantly being tweaked, so what worked yesterday may be off today.

ChatGPT was obedient with the grill-me technique, just wrote a plan. Yesterday it started jumping to implementation. Why?

I find that when an LLM jumps into tasks it was not told to do (or even worse, doing things it was explicitly told not to), it is a good sign the context is too full, and you should do a controlled hand-off to a new instance.

I wipe my context relentlessly. I never have long-running conversations. In and out like Seal Team Six.

Except, where every different model and version is like a different person where you need to learn their idiosyncrasies of how they work every other month.

It's a very very bizarre way to use a computer.

Personally, I just don't. I'll use and prompt the LLMs the way that feels natural to me and move on with my life. Maybe I don't always get completely optimal results from them, but im also not spending half my day pleading with the computer to do a task.

I also don't think I need to prompt Claude differently than Codex.

The most important thing to be aware of in my opinion would be that Claude is better at UI design, and leaves a lot more comments in the code.

Other than that the results seem similar, at least functionally. I do not usually review the code style.

They are not human. Humans have names, faces, voices, personality, a personal history, family, care for whatever they call their community.

With humans it's actually good and worthwhile to create and strengthen connections. With an LLM, that's psychosis.

To be fair: a voice, personality, and personal history sounds a lot like training data.

I don't think LLMs are people in any sense, at least as they're constructed now -- but they very much have what we would call "culture" and "personality" in suitably alien forms.

This is not the same as, e.g., feelings, experience, or humanity, or actual opinions or ideas (versus essentially "distilled vibes") and I feel that AI will more and more force us to confront that (including if new AIs are ever developed that may have the latter, as well!)

They are not human, but it helps to prompt them similarly. See: https://www.anthropic.com/research/emotion-concepts-function

Good read. Thanks for sharing.

They're not human. But they are trained on human language, and thinking of them as similar to a human helps me work with them effectively.

These things passing the Turing Test makes anthropomorphizing their behavior awkward, but don’t forget it’s just an analogy to communicate an experience. If you convey a certain written voice to these models in your input, you get a somewhat consistent end effect. I think that’s all that is being communicated.

If you have a toolbox full of similar but different tool getting to know them is a prudent thing to do, not a psychosis. There's no connection because the tool is immutable (except for adjustments you made) but you do develop a specific relation with that tool. Some people even love some of their tools at some level.

And if humans are anything, they are tool users.

>If you have a toolbox full of similar but different tool getting to know them is a prudent thing to do, not a psychosis

Can be both. Use of some tools like LLMs might be more inducing psychosis than others like plain compilers or hammers.

>And if humans are anything, they are tool users.

To the point of self-destruction sometimes.

> Use of some tools like LLMs might be more inducing psychosis than others like plain compilers or hammers.

I really don't get it. Why the fact that it outputs words is so goddamn important for everybody? How does it suddenly make you so emotionally vulnerable? Does my brain work in a different way than the rest of humanity? Can't you disregard what's irrelevant? Is every programmer suddenly a trump supporter that has no ability to recognize empty words? To recognize lies about emotions and facts?

Words are just input. Mostly garbage. Emotion inducing words are garbage 10 times more often than any other. I could expect romance reader to be affected, or somebody with iq 70. But how the caste of some of the most technical people ever is afraid of catching psychosis just because they might read some words?

It's a certain percentage of people and yes it's different for them because it outputs words and triggers some kind of emotional trust response.

As good opportunity as any to acquire some emotional intelligence.

Yeah, AI tools bring software developers closer to the messy real world where 0 and 1 aren't always exactly 0 and 1.

Computing is useful for exactly going away from the messy real world of humans. I don’t need random errors in my financial transactions. I don’t want random errors when doctors are retrieving my medical history. And I don’t want random errors in my backup,… There’s plenty of non-deterministic things in my life, I don’t want my computer to follow suite.

No, I won't anthropomorphise LLMs.

If there was anything that made sense to anthropomorphise it would be a machine meant to mimic talking, thinking and answering like a human, one that even passes the Turing test.

When we built the idea that anthropomorphising is wrong, we meant when doing it for rocks or trees or thunders or deer or some such.

That's your prerogative, but be aware you'll continue to remain confused about LLMs. Anthropomorphizing them is what gives you the best high-level intuition about where and how to employ them, and where and how not to.

This is so dumb and goes against all the principles that enabled computers and smartphones to achieve wide adoption - the technology should evolve to fit the human. Not the other way around.

I'd argue the opposite. Technology in the past few decades was (is) limited and humans had to adapt to it.

We communicate with other humans using voice and three dimensional hand gestures. To use computers and early phones we had to learn to operate new input devices: keyboards and mice. Later with touchscreens we moved to two dimensional hand (finger) gestures. We're barely making voice commands work with our devices just recently.

Then, a large number of humans are figuratively tethered to their desks because the devices need power and stable internet connection. Mobile devices break this relationship a bit but you still need to charge them and be close to some sort of access point. In any case, the devices encourage sitting in one place for hours at time.

And this is just computers and smartphones. Humans adapted their entire lifestyles and transformed the landscape to cater to cars.

> Technology in the past few decades was (is) limited and humans had to adapt to it.

Was it? Think first about what it replaced. Lots of manual computation in bookkeeping and financial sectors. Telegrams and snail mail moved to email. Typesetting in books and magazines became easier and widely available,…

If there’s one thing that you can’t say about computers is that they’re limited.

No doubt that computers enabled a lot of automation. We can both agree with that.

The context was that technology should evolve to fit the humans [not the other way around]. And if contemporary technology didn't have limitations, it would be correct.

But it did and humans had to adapt to the computers. Humans had to develop and learn special languages so they could communicate with computers to do all those useful things you mentioned. Why? They were limited in understanding (or parsing) human languages. It took us decades before we could talk to computers in human languages. We're getting pretty close - especially in the past few years - but there's still some friction.

> Humans had to develop and learn special languages so they could communicate with computers to do all those useful things you mentioned. Why? They were limited in understanding (or parsing) human languages

You may need to revisit your computation theory courses. Computers are the embodiment of a mathematical model and thus the inputs and outputs are formalized.

Do you just hold a pen and words are written automatically? Do you just hover your hands over a piano and have the moonlight sonata played? No, you have to do precise mechanical movements because that’s how the output is realized.

There’s no such things as words, sentences, keywords, statements at the computer level. What it does is symbol manipulation. You provide it a string of symbols, the rules for the manipulation, and it will provide a string of symbols as the output.

What symbols, what rules, are completely arbitrary . We just found that {1,0} are all that we needed as the set of symbols and that Context-Free Grammar is perfect for specifying the rules.

We still need to encode everything down to binary (ascii, unicode, bcd, floating points, pixel formats, PCM,…) and use a programming language (as defined by a grammar) to get the computer to do anything. Inference is made possible by those two mechanisms. It’s not a new computation model.

I mean, like, you can lament the state of the world all you want. It is what it is. Of course the AI labs would also like to make their models more consistent, but it's not how the technology works. They're black boxes to everybody.

Please do not think of LLMs like human helpers, that is a recipe for long term sociopathy.