I think It's not because AI working on "misaligned" goals. The user never specify the goal clearly enough for AI system to work.
However, I think producing detailed enough specification requires same or even larger amount of work than writing code. We write rough specification and clarify these during the process of coding. I think there are minimal effort required to produce these specification, AI will not help you speed up these effort.
That makes me wonder about the "higher and higher-level language" escalator. When you're writing in assembly, is it more work to write the code than the spec? And the reverse is true if you can code up your system in Ruby? If so, does that imply anything about the "spec driven" workflow people are using with AIs? Are we right on the cusp where writing natural language specs and writing high level code are comparably productive?
Programming languages can be a thinking tool for a lot of tasks. Very much like a lot of notation, like music sheet and map drawing. A condensed and somewhat formal manner of describing ideas can increase communication speed. It may lack nuance, but in some case, nuance is harmful.
The nice thing about code compared to other notation is that it's useful on its. You describe an algorithm and the machine can then solve the problem ad infinitum. It's one step instead of the two step of writing a spec and having an LLM translate it, then having to verify the output and alter it.
Assembly and high level languages are equivalent in terms of semantics. The latter helps in managing complexity, by reducing harmful possibilities (managing memory, off-by-one errors) and presenting common patterns (iterators/collections, struct and other data structures, ....) so that categories of problems are easily solved. There's no higher level of computing model unlocked. Just faster level of productivity unlocked by following proven patterns.
Spec driven workflow is a mirage, because even the best specs will leave a lot of unspecified details. Which are crucial as most of programming is making the computer not do the various things it can do.
> most of programming is making the computer not do the various things it can do
This is a very stimulating way of putting it!
I believe that the issue right now is that we're using languages designed for human creation in an AI context. I think we probably want languages that are optimized for AI written but human read code, so the surface texture is a lot different.
My particular hypothesis on this is something that feels a little bit like python and ruby, but has an absolutely insane overkill type system to help guide the AI. I also threw in a little lispiness on my draft: https://github.com/jaggederest/locque/
I don't know, LLMs strive on human text, so I would wager that a language designed for humans would quite closely match an ideal one for LLMs. Probably the only difference is that LLMs are not "lazy", they better tolerate boilerplate, and lower complexity structures likely fit them better. (E.g. they can't really one-shot understand some imported custom operator that is not very common in its training data)
Also, they rely surprisingly closely on "good" code patterns, like comments and naming conventions.
So if anything, a managed language [1] with a decent type system and not a lot of features would be the best, especially if it has a lot of code in its training data. So I would rather vote on Java, or something close.
[1] reasoning about life times, even if aided by the compiler is a global property, and LLMs are not particularly good at that
But that is leas fundamental then you make it sound. LLMs work well with human language because that’s all they are trained on. So what else _could_ an ideal language possible look like?
On the other hand: the usefulness of LLMs will always be gated by their interface to the human world. So even if their internal communication might be superseded at some point. Their contact surface can only evolve if their partners/subjects/masters can interface
When I think of the effect of a single word on Agent behavior - I wonder if a 'compiler' for the human prompt isn't something that would benefit the engineer.
I've had comical instances where asking an agent to "perform the refactor within somespec.md" results in it ... refactoring the spec as opposed to performing a refactor of the code mentioned in the spec. If I say "Implement the refactor within somespec.md" it's never misunderstood.
With LLMs _so_ strongly aligned on language and having deep semantic links, a hypothetical prompt compiler could ensure that your intent converts into the strongest weighted individual words to ensure maximal direction following and outcome.
Intent classification (task frame) -> Reference Binding (inputs v targets) -> high-leverage word selection .... -> Constraints(?) = <optimal prompt>
If you are on the same wave length as someone you don't need to produce a full spec. You can trust that the other person has the same vision as you and will pick reasonable ways to implement things. This is one reason why personalized AI agents are important.
As of today though, that doesn't work. Even straightforward tasks that are perfectly spec-ed can't be reliably done with agents, at least in my experience.
I recently used Claude for a refactor. I had an exact list of call sites, with positions etc. The model had to add .foo to a bunch of builders that were either at that position or slightly before (the code position was for .result() or whatever.) I gave it the file and the instruction, and it mostly did it, but it also took the opportunity to "fix" similar builders near those I specified.
That is after iterating a few times on the prompt (first time it didn't want to do it because it was too much work, second time it tried to do it via regex, etc.)
> I think producing detailed enough specification requires same or even larger amount of work than writing code
Our team has started dedicating much more time writing documentation for our SaaS app, no one seems to want to do it naturally, but there is very large potential for opening your system to machine automation. Not just for coding but customer facing tooling. I saw a preview of that possible future using NewRelic where they have an AI chat use their existing SQL-like query language to build tables and charts from natural language queries right in the web app. Theirs kinda sucks but there's so much potential there that it is very likely going to change how we build UIs and software interfaces.
Plus it also helps sales, support, and SEO having lots of documentation on how stuff works.
Detailed specification also helps root out conflicting design requirements and points at the desired behavior when bugs are actually found. It also helps when other stakeholders can read it and see misalignment with what their users/customers actually need.
My thought too. To extend this coding agents will make code cheap, specifications cheaper, but may also invert the relative opportunity cost of not writing a good spec.
> The user never specify the goal clearly enough for AI system to work.
This is sort of a fundamental problem with all AI. If you tell a robot assistant to "make a cup of tea", how's it supposed to know that that implies "don't break the priceless vase in the kitchen" and "don't step on the cat's tail", et cetera. You're never going to align it well enough with "human values" to be safe. Even just defining in human-understandable terms what those values are is a deep existential question of philosophy, let alone specifying it for a machine that's capable of acting in the world independently.