People are missing that Willison is among the very best people we have in the role of (for lack of a good name): early access to frontier models, evaluate them in real scenarios, no wishful thinking, hype, or doom, communicate the possibilities. Yes he could have fixed this himself but then he would have learned nothing about the AI, and we wouldn't have read a fascinating and important article.

>> he would have learned nothing about the AI

there is absolutely zero value in spending time to learn about new models as in few months new model will be out and whatever you learned about the current one will be useless.

Also with models getting better and better you have to know less and less to achieve same results.

My experience has been the exact opposite.

As the models get better you need to know more about their capabilities, because otherwise you risk prompting Claude Fable 5 like it's GPT-4o and complaining loudly about how it's all hype and nothing about these models is improving at all (yes, I do see people say that.)

Getting the best results out of these models requires skill, experience, intuition, and domain expertise. There's always room for improving every one of those.

The new benchmark for LLMs is how much of simonw's new know-how is required.

Lower bars are better.

I agree but this particular example showed nothing about leveraging skill, experience, or intuition. If anything, this is another straightforward example of a one shot ask.

edit: that said, I understand this particular post is about model capability

Eh, I've have the exact opposite experience.

Way back before instruct models it was pretty difficult, but for the last couple of years I haven't needed anything more complex than the type of text that I might send in a detailed email to a colleague.

Isn't the whole point of a better model that it should be better at understanding you than the previous one? So the same prompt should return a better answer.

Prompting differently to the new model seems entirely backwards when trying to determine if the model has improved.

It doesn't matter how good the models get, they still won't be able to act on unclear directions.

Learning to provide unambiguous, clear directions is a skill. A lot of people who report bad experiences with models aren't yet good at that skill.

More importantly though, the key to successful communication is having a good understanding of what the other side of the conversation already knows and understands.

Saying "use uv and inline script dependencies" won't mean anything to a model with a knowledge cutoff date prior to the launch of uv!

It's perfectly possible to act on unclear directions. The correct course of action is asking clarifying questions.

I think this is true when models were going from bad to pretty good like happened last year. But when they start to get good, and can work deeper and with more nuance, how you prompt also can change the results quite a bit. Note this is also true of asking smart humans to do things; personality and approaches vary, they don’t exist on a single axis continuum of quality

[dead]

There’s zero value? Surely you don’t believe zero, it’s potentially the most powerful predictive AI in the world ever made? Maybe only incremental steps sure. But also their IPO is coming, you don’t want people evaluating them beforehand?

What is intelligence? Better to call it LLM.

you know, women make a big deal about you meeting their father/parents, and honestly, I'm too autistic to really fucking have put any importance until now as to why that was remotely important, but if N+1 is coming for your job, it seems it might be worth your while to know the capabilities of N, no?

[dead]