I've had the same thought, maybe more grandiosely. The idea is that LLM prompts are code -- after all they are text that gets 'compiled' (by the LLM) into a lower-level language (the actual code). The compile process is more involved because it might involve some back-and-forth, but on the other hand it is much higher level. The goal is to have a web of prompts become the source of truth for the software: sort of like the flowchart that describes the codebase 'is' the codebase.
The "prompts are code" framing is right, and the compile analogy holds further than people think. Real code has structure: typed parameters, return types, separated concerns. A raw prose prompt is more like a shell one-liner with everything inlined. It works, but it breaks when you try to reuse or modify it.
If you take the compile idea seriously, the next step is to give prompts the same structure code has: separate the role from the context from the constraints from the output spec. Then compile that into XML for the model.
I built flompt (https://github.com/Nyrok/flompt) as a tool for this. Canvas where you place typed blocks (role, objective, constraints, output format, etc.) and compile to structured XML. Basically an IDE for prompts, not a text editor. A star would help a lot if this resonates.
No it doesn’t get compiled. Compilation is a translation from one formal language to another that can be rigorously modeled and is generally reproducible.
Translating from a natural language spec to code involves a truly massive amount of decision making because it’s ambiguous. For a non trivial program, 2 implementations of the same natural language spec will have thousands of observable differences.
Where we are today, that is agents require guardrails to keep from spinning out, there is no way to let agents work on code autonomously or constantly recompile specs that won’t end up with all of those observable differences constantly shifting, resulting in unusable software.
Tests can’t prevent this because for a test suite to cover all observable behavior, it would need to be more complex than the code. In which case, it wouldn’t be any easier for machine or human to understand. The only solution to this problem is that LLMs get better.
Personally I think at the point they can pull this off, they can do any white collar job, and there’s not point in planning for that future because it results in either Mad Max or Star Trek.
well you have to expand your definition of "compile" a bit. There is clearly a similarity, whether or not you want to call it the same word. Maybe it needs a neologism akin to 'transpiled'.
other than that you seem to be arguing against someone other than me. I certainly agree that agents / existing options would be chaotic hell to use this way. But I think the high-level idea has some potential, independent of that.
I fundamentally don’t think the higher level idea has any potential because of the ambiguity of natural language. And I certainly don’t think it has anything in common with compilation unless you want to stretch the definition so far as to say that engineers are compilers. It’s delegation not abstraction.
I think we’ll either get to the point where AI is so advanced it replaces the manager, the PM, the engineer, the designer, and the CEO, or we’ll keep using formal languages to specify how computers should work.
One problem with this is that there isn't really a "current prompt" that completely describes the current source code; each source file is accompanied by a full chat log, including false starts and misunderstandings. It's sort of like reading a git history instead of the actual file.
> each source file is accompanied by a full chat log, including false starts and misunderstandings. It's sort of like reading a git history instead of the actual file.
My Git history contains links between the false starts and misunderstandings and the corrections, which then also include a paragraph on my this was a misunderstanding or false start. It is a lot better than just a single linear log from LLMs.
> each source file is accompanied by a full chat log, including false starts and misunderstandings. It's sort of like reading a git history instead of the actual file.
My Git history contains links between the false starts and misunderstandings and the corrections, which then also include a paragraph on my this was a misunderstanding or false start. It is a lot better than just a single linear log.
true, but that just means that's the problem to solve. probably the ideal architecture isn't possible right now. But I sorta imagine that you could later on take the full transcript of that conversation and expect any LLM to implement more or less the same thing based on it, so that eventually it becomes a full 'spec'.
And maybe there is a way to trim the parts out of it that are not needed... like to automatically produce an initial prompt which is equivalent to the results of a longer session, but is precise enough so as to not need clarification upon reprocessing it. Something like that? I'm not sure if that's something that already exists.
> But I sorta imagine that you could later on take the full transcript of that conversation and expect any LLM to implement more or less the same thing based on it
Why would you think this though? There are an infinite number of programs that can satisfy any non-trivial spec.
We have theoretical solutions to LLM non-determinism, we have no theoretical solutions to prompt instability especially when we can’t even measure what correct is.
yeah but all of the infinite programs are valid if they satisfy the spec (well, within reason). That's kinda the point. Implementation details like how the code is structured or what language it's in are swept under the rug, akin to how today you don't really care what register layout the compiler chooses for some code.
There has never been a non trivial program in the history of the world that could just “sweep all the implementation details under the rug”.
Compilers use rigorous modeling to guarantee semantic equality and that is only possible because they are translating between formal languages.
A natural language spec can never be precise enough to specify all possible observable behaviors, so your bot swarm trying to satisfy the spec is guaranteed to constantly change observable behaviors.
This gets exposed to users and churn, jank, and workflow breaking bugs.