Hmm. I don’t think that novel code generation can be accounted for with glorified search.
I can have my agentic system read a few data sheets, then I explain the project requirements and have it design driver specifications, protocols, interfaces, and state machines. Taking those, develop an implementation plan. Working from that, write the skeleton of the application, then fill it in to create a functional system using a novel combination of hardware.
Done correctly, I end up with better, more maintainable, smaller code than I used to with a small team, at 1/100 the cost and 1/4 the time.
Whatever that is, it more closely resembles reasoning than search.
Unless, of course, you’d also call bare metal C development on novel hardware search, in which case I guess all dev is search?
How do you even know those numbers are correct? Realistically for what you've described you need more QA time that a traditional application to ensure its actually working properly. Especially with regards to any part of the application that deals with LLM inference. Its not hard to write unique content for niche topics where there are few relevant results and have LLMs take it as fact.
For example, I poisoned the well for research on early Arab Americans immigrants by repeatedly posting about how many family passed as different ethnicity to make their lives easier, so now if you ask LLMs about that subject it'll include information I wrote which isn't entirely correct because I hadn't figured everything out before the LLM trained on it.
EDIT: Now imagine if I had done this on an obscure programming-related problem, yeah? I could potentially make the LLM reference packages that do not actually exist and put backdoors in applications.
Because I have 100 percent test coverage (of the software, some hardware edge cases pop up that aren’t documented in the data sheets), and over 10k hours of field deployment over 130 devices? This rollout has been much more bug free than any we have done in the last six years, and it’s the first that has been almost zero hand coded. (Our system is far from vibe coding however, there is a very strict pipeline)
I’m not saying that AI can solve every problem or that it is without problems (we spent hundreds of hours developing a concept to production pipeline just to make sure it doesn’t go off the rails)
But the net result is that a good senior dev with an acutely olfactory paranoia can supervise a production pipeline and produce efficient, maintainable code at a much faster rate (and ridiculously lower cost) that he was doing before supervising 3 or 4 devs on a complex hardware project. I can’t speak for other types of development, but our applications devs are also leveraging AI code generation and it -seems- to be working out.
Now, where those senior devs are going to come from in the future… that imho is a huge problem. It’s definitely some flavor of eating the goose that lays the golden egg here.
It's blindingly obvious what the big bet is. The senior devs are going to come from the next generations of AI systems.
That’s the big bet, for sure… but if it’s reasoning that the supervising devs are injecting, and ai systems can’t reason, I guess it won’t work? Idk, I kinda think they do reason, though not in the way people might think.
It’s definitely true that they are statistical next token predictors, and that is intrinsically pattern matching, and reasonable to say not capable of reasoning.
But my intuition is that that is not really what is going on. The token prediction is the hardware layer. The software is the sum total of collective human culture they are trained on. The software is doing the reasoning, not the hardware. Like a Z80 can’t play chess, but software that runs on a Z80 certainly can.
Idk, that’s my -feeling- on the conundrum. Who knows, I guess we will find out.
If the easiest pathway to high performance next token prediction lies through reasoning, then training for better next token prediction ends up training for reasoning implicitly.
By now, there's every reason to believe that this is what's happening in LLMs.
"Reasoning primitives" are learned in pre-training - and SFT and RL then assemble them into high performance reasoning chains, converting "reasoning as a side effect of next token prediction" to "reasoning as an explicit first class objective".
The end result is quite impressive. By now, it seems like the gap between human reasoning and LLM reasoning isn't "an entirely different thing altogether" - it's "humans still do it better at the very top end of the performance curve - when trained for the task and paying full attention".
"The software is the sum total of collective human culture they are trained on."
Almost, they are the median or most popular aspects of the culture upon which they are trained. So you are getting the most popular way to do something, not the best (for some definition of best). That's why the claims about LLMs being geniuses is absurd. They almost by definition are going to have the average IQ of all the people on the net weighted by how much each person posts. I'm guessing that's about 95.
You'd have to define 'novel code generation' and why dealing someone a poker hand they have never seen before isn't 'novel poker hand generation.' Not being snarky here, just understanding the way that LLMs work I am well aware that you can come up with things that nobody has seen before, and the 'how' is very much like the 'genetic' programming of times past.
Sure, apply this pattern to that set of specifications. The very fact that the language has a fixed set of defined keywords sort of makes it all “pattern matching”, but computabillity theory implies that you can definitely use patterns to create novel solutions. I guess it’s where you draw the line?
> ... it more closely resembles reasoning than search.
I get that, to you, it feels like reasoning. I'm not arguing about that. I expect we have different ideas of what sort of steps constitute reasoning. I'm also entirely unclear that we have the same understanding of computability theory.
For example, a program can start at the beginning of a maze, and "compute" a path through it with a recursive algorithm that splits at every branch. Is is "reasoning" about how to solve the maze? If you believe that it is, then I understand your position and, as you surmised, I have a different definition of 'reasoning' than that one.
For me, a classic "reasoning"[1] test is diagramming English sentences. That's because in order to diagram a sentence you need to understand both the rules around nouns, verbs, adverbs, and such, and what the sentence is actually saying. Some of the rules have exceptions and those exceptions are perfectly valid. In computation you might say this problem is not NP complete, and yet people do it all the time.
Anyway, I appreciate the additional context you've provided.
[1] using quotes here because I am operating under the understanding that substituting your version of what reasoning means in this context might not parse well.
It’s pattern matching. A big part of reasoning for sure, but not reasoning per se
That could be, but if that is the case than development apparently doesn’t require reasoning? Or maybe that’s the part that the senior developer supervising the pipeline injects. Thats certainly a plausible position.
>but if that is the case than development apparently doesn’t require reasoning?
Certainly plenty of it does not.
Ctrlc stack exchange lol
>I can have my agentic system read a few data sheets, then I explain the project requirements and have it design driver specifications, protocols, interfaces, and state machines. Taking those, develop an implementation plan. Working from that, write the skeleton of the application, then fill it in to create a functional system using a novel combination of hardware.
When you put it that way, isn't it crazy you have to tell it to do that? Like shouldn't it just figure out it needs to do that?
This is exactly it. A human capable of reasoning might not know how to write code. But they can learn and be taught. Eventually, you can give them a vague problem, and they’ll know what clarifying questions to ask and how to write the code. LLMs cannot do that.
If you have to do the reasoning and tell the LLM the results of your reasoning before it can generate the code you want, surely that tells you the LLM isn’t reasoning. Agentic workflows hide some of it, but anyone who’s interacted even a little with an LLM can tell they’re not reasoning, no matter how OpenAI and Anthropic label their models.
I’m not really sure. I’m constantly presented with a blurry line and it isn’t getting less blurry. If anything it’s slowly dissolving. Or maybe it’s me, falling victim to AI psychosis lol.
To be fair, I also have had to explain this same basic workflow to junior devs in the past, so I guess not?
> I also have had to explain this same basic workflow to junior devs
That would not surprise me.
The difference is that the LLM will -probably- make an attempt to follow my instructions, whereas there is an even chance that the junior dev will decide all that pedantic reflection is below their genius, and will launch straight into hacking together something that usually works fine within its own scope, but has to be mostly thrown out anyway.
Structure exists for a reason, and I say that as someone who loves to go into deep hack and produce some ultra clever jamboozle that works spectacularly well, as long as you don’t ever have to touch it. In production, there is no worse code than clever code. It’s soul sucking, but we have to make peace with elegance = maintainability / portability. Often, that means 30 LOC instead of ten, but future you thanks you, and the (modern, optimised) compiler doesn’t care.