Yeah and then it becomes an unmaintainable monolith because at some point the AI also lost track of what code does what.

Great for Opus because you’re now a captive customer.

Yes, it's a risk if you don't guide it well, but you can also manage it pretty ok.

I have a side project that I started in January 2024. Initially, used Github Copilot autocompletions heavily. This year I started using CLI agents (mostly Claude, but others too) to do more stuff. I got to around 100k LoC (sure, it's not enterprise scale, but for a personal project it's pretty big), but I'd argue it's maintainable, it's split into 10 Django apps, that are each pretty self contained, I've done several refactors on it (using AI agents) to make it more maintainable.

The point of eventual “all-code-is-written-by-AI” is that it really does not matter if your code is maintainable or not. In the end, most of the products are written to accomplish some sort of a goal or serve a need within a given set of restrictions (cost, speed and etc.). If the goal is achieved within given restrictions, the codebase can be thrown away until the next need is there to just create everything from scratch, if needed.

I don't buy it.

I think that could work, but it can work in the same way that plenty of big companies have codebases that are a giant ball of mud and yet they somehow manage to stay in business and occasionally ship a new feature.

Meanwhile their rivals with well constructed codebases who can promptly ship features that work are able to run rings around them.

I expect that we'll learn over time that LLM-managed big ball of mud codebases are less valuable than LLM-managed high quality well architected long-term maintained codebases.

Fair enough. In my imagination, I can see people writing AI-first framework/architectures and a general trend for people to “migrate to such frameworks”, just like the push towards the microservices architectures in 2010s. A part of these frameworks would be “re-constructibility” by changing contracts in parts where it matters, and somehow the framework would make it easy for the LLM to discover such “parts”.

Honestly, i’m making stuff up, as I don’t think it’s feasible right now because of the context sizes. But given how fast things develop, maybe in a couple of years things might change.

No you're not making it up, this is exactly what some people are working on. Agent frameworks are starting to move towards "dynamic" service discovery/runtime introspection and composition-with-guardrails. Some keywords are "agent mesh", and the general marketing from AI companies about AI "inventors", and agent-driven interfaces like Google's a2ui (which is just a spec)

We recently started working on https://github.com/accretional/collector to serve as a dynamic proto ORM+CRUD server with search and discovery, and features for operating as a node in an "agent/service mesh". The idea is that you can create a uniform interface for data retrieval/search/APIs that lets agents dynamically register, invoke, or discover any data type or service, or write it themselves, then register it locally or share it.

It is feasible to do this stuff now actually, just a bit tricky because most LLMs aren't trained to operate this way without very explicit instructions for how to do so, and for collector specifically the API surface is probably too big. But I am pretty sure neither would take long to fix if enough people were adopting this kind of pattern.

That’s actually really cool, and makes sense in my head! This is somewhat how I imagined it, except my guess would be someone would fine tune a general purpose LLMs (somehow, as it is much cheaper than starting from scratch, idk?) to behave this way rather than instructing it all the way in. And whoever develops the framework would package it with the access to this fine-tuned LLM.

But yeah, what you guys are doing looks sweet! I need to get out of my ass and see what people are doing in this sphere as it sounds fun.

> fine tune a general purpose LLMs (somehow, as it is much cheaper than starting from scratch, idk?) to behave this way rather than instructing it all the way in

I'd love to do that too but there are basically three ways to teach LLMs how to use it afaik: with data created "in the wild" and a degree of curation or augmentation, or with full-on reinforcement learning/goal-oriented training, or some kind of hybrid based on eg conformance testing and validating LLM output at a less sophisticated level (eg if it tries to call an api that's not in the set that it just saw during discovery, the LLM is being dumb, train it out of doing that).

The thing is they are not really mutually exclusive, and LLM companies will do it anyway to make their models useful if enough people are using this or want to use it. This is what's happened already with eg MCP and skills and many programming languages. Anyway, if prompting works to get it to use it properly it validates that the model can be trained to follow that process too, the same way it knows how to work with React

I see, makes sense! I’ll try to keep up to see what you guys are doing and overcome the problems. Thanks a lot!

My experience with LLM and agents has led to the opinion that a LLM-friendly codebase is actually a very human friendly code base.

Same here. So far everything I have found to help LLMs is just good practice generally: automated tests, documentation, clear issue descriptions, a neat commit history, well featured code etc.

Documentation, aka any kind of architectural plan,

vs

“we’ll figure it out when we get there” human slop.

I think the models, if they continue to get better, and frameworks/service patterns change to accommodate AI's. Where pieces of code will be thrown away, etc because the new code will be designed to slowly accommodate the "big ball of mud" risk.

We are moving from a conceptual/model job which typically requires training and skills (i.e. the code model/tool use/etc meets the requirements) to simply validation which is an easier problem and/or can be sharded in other roles. In other words the engineering part (i.e. the fun part) will be left to the AI. What I've found is people types (e.g. managers), and QA types (if it works, I don't care, this is what needs to work) will do well. People who liked the craftsman ship, solving problems, etc will do worse. Pure tech IMO will be less and less of a career.

It’s interesting how the monolith companies with a big ball of shit still stay in business.

But I’d say some projects (I expect to live less than 1 year) I’d just vibe code them so I won’t care much about the code. I just give very high level architectural ideas and that’s it.

Other projects which I expect lifespan to be more than 1-2 years I won’t let it become a ball of shit.

So it depends on the project.

And at the end of the day it's not really a tradeoff we'll need to make, anyways: my experience with e.g. Claude Code is that every model iteration gets much better at avoiding balls of mud, even without tons of manual guidance and pleading.

I get that even now it's very easy to let stuff get out of hand if you aren't paying close attention yourself to the actual code, so people assume that it's some fundamental limitation of all LLMs. But it's not, much like 6 fingered hands was just a temporary state, not anything deep or necessary that was enforced by the diffusion architecture.

It does matter because the code needs to still be legible and discoverable and semantic enough for other AI to find it and use it without it being so confusing or painful that they prefer to just write it themselves.

The reason software is so valuable is that it's capital/up-front investment in figuring something out that can continuously deliver value with low or no marginal cost. Rewriting/maintenance/difficulty/figuring out software is marginal cost.

Recreating everything from scratch gets harder and the previous requirements will eventually not be met after sufficient number of them have been accumulated. AI would have no solution to this unless it iterate on the same code base, but since I've not seen evidence of architectural maintainability from AI, a project that are fully given to AI is bound to fail.

AI is still incredibly useful used in tandem, but have it implement full feature from one sentence usually lead to doom.

In the case of OP, they cannot even test it, because they have no clue how it works. They cannot test whether the goal was achieved or not.

The other day I generated an MCP server for AST of Java. I had no clue how that works. I couldn’t test it because I had no idea how that looks like. Btw, AI even lied in tests, because it literally mocked out everything from live code. So everything was green, and literally nothing was tested, and it was untestable manually by me.

If you don’t know that Opus isn’t an entity, but a model,

you might be a little too far removed from the situation to comment authoritatively?