I've been listening to the insane 100x productivity gains you all are getting with AI and "this new crazy model is a real game changer" for a few years now, I think it's about time I asked:

Can you guys point me ton a single useful, majority LLM-written, preferably reliable, program that solves a non-trivial problem that hasn't been solved before a bunch of times in publicly available code?

In the 1930s, when electronic calculators were first introduced, there was a widespread belief that accounting as a career was finished. Instead, the opposite became true. Accounting as a profession grew, becoming far more analytical/strategic than it had been previously.

You are correct that these models primarily address problems that have already been solved. However, that has always been the case for the majority of technical challenges. Before LLMs, we would often spend days searching Stack Overflow to find and adapt the right solution.

Another way to look at this is through the lens of problem decomposition as well. If a complex problem is a collection of sub-problems, receiving immediate solutions for those components accelerates the path to the final result.

For example, I was recently struggling with a UI feature where I wanted cards to follow a fan-like arc. I couldn't quite get the implementation right until I gave it to Gemini. It didn't solve the entire problem for me, but it suggested an approach involving polar coordinates and sine/cosine values. I was able to take that foundational logic turn it into a feature I wanted.

Was it a 100x productivity gain? No. But it was easily a 2x gain, because it replaced hours of searching and waiting for a mental breakthrough with immediate direction.

There was also a relevant thread on Hacker News recently regarding "vibe coding":

https://news.ycombinator.com/item?id=45205232

The developer created a unique game using scroll behavior as the primary input. While the technical aspects of scroll events are certainly "solved" problems, the creative application was novel.

The story you're describing doesn't seem much better than one could get from googling around and going on stackoverflow

It doesn’t have to be, really. Even if it could replace 30% of documentation and SO scrounging, that’s pretty valuable. Especially since you can offload that and go take a coffee.

It’s better in the sense that it’s much faster. Bikes and cars don’t theoretically get you to different places than walking, but open up whole categories of what’s practically reachable.

I think the 'better than googling' part is less about the final code and more about the friction.

For example, consider this game: The game creates a target that's randomly generated on the screen and have a player at the middle of the screen that needs to hit the target. When a key is pressed, the player swings a rope attached to a metal ball in circles above it's head, at a certain rotational velocity. Upon key release, the player has to let go of the rope and the ball travels tangentially from the point of release. Each time you hit the target you score.

Now, I’m trying to calculate the tangential velocity of a projectile from a circular path, I could find the trig formulas on Stack Overflow. But with an LLM, I can describe the 'vibe' of the game mechanic and get the math scaffolded in seconds.

It's that shift from searching for syntax to architecting the logic that feels like the real win.

The downside is that you miss the chance to brush up on your math skills, skills that could help you understand and express more complicated requirements.

...This may still be worth it. In any case it will stop being a problem once the human is completely out of the loop.

edit: but personally I hate missing out on the chance to learn something.

That would indeed be the case if one has never learned the stuff. And I am all in for not using AI/LLM for homework/assignments. I don't know about others, but when I was in school, they didn't let us use calculators in exams.

Today, I know very well how to multiply 98123948 and 109823593 by hand. That doesn't mean I will do it by hand if I have a calculator handy.

Also, ancient scholars, most notably Socrates via Plato, opposed writing because they believed it would weaken human memory, create false wisdom, and stifle interactive dialogue. But hey, turns out you learn better if you write and practice.

In later classes in school, the calculator itself didn't help. If you didn't know the material well enough, you didn't know what to put into the calculator.

Why even come to this site if you're so anti-innovation?

Today with LLMs you can literally spend 5 minutes defining what you want to get, press send, go grab a coffee and come back to a working POC of something, in literally any programming language.

This is literally stuff of wonders and magic that redefines how we interface with computers and code. And the only thing you can think of is to ask if it can do something completely novel (that it's so hard to even quantity for humans that we don't have software patents mainly for that reason).

And the same model can also answer you if you ask it about maths, making you an itinerary or a recipe for lasagnas. C'mon now.

Agree but you are talking about a POC, and he is talking about reliable, working software. this phase of LLM are perfect for POCs and there you can have 10x speedup, no question. But going from a POC to a working reliable software is where most of our time is spent anyway even without LLMS.

With LLMs this phase becomes worse. we speedup 10x the poc time, we slow down almost as much in the next phases, because now you have a poc of 10k lines that you are not familiar with at all, that have to pay way more attention at code review, that have to bolt on security as an afterthought (a major slowdown now, so much so that there are dedicated companies whose business model has become fixing Security problems caused by LLM POCs). Next phase, POCs are almost always 99% happy path. Bolt on edge case as another after thought and because you did not write any of those 10k lines how do you even know what edge cases might be neccesary to cover? maybe you guessed it rigth, spend even more time studing the unfamiliar code.

We use LLM extensivly now in our day to day, development has become somewhat more enjoyable but there is, at least as of now, no real increase in final delivry times, we have just redestributed where effort and time goes.

At our company we use AI extensively to see if we missed edge cases and it does a pretty good job in pointing us towards places which could be handled better.

I know we all think we are always so deep into absolutely novel territory, which only our beautiful mind can solve. But for the vast majority of work done in the world, that work is transformative. You take X + Y and you get Z. Even with brand new api, you can just slap in the documentation and navigate it in order of magnitude faster than without.

I started using it for embedded systems doing something which I could literally find nothing about in rust but plenty in arduino/C code. The LLM allowed me to make that process so much faster.

> no real increase in final delivry times

That’s not true though. The ability to de-risk concepts within a day instead of weeks will speed up the timeline tremendously.

I don't think that the user you are responding to is anti-innovation, but rather points out that the usefulness of AI is oversold.

I'm using Copilot for Visual Studio at work. It is useful for me to speed some typing up using the auto-complete. On the other hand in agentic mode it fails to follow simple basic orders, and needs hand-holding to run. This might not be the most bleeding-edge setup, but the discrepancy between how it's sold and how much it actually helps for me is very real.

I think copilot is widely considered to be fairly rubbish, your description of agentic coding was also my experience prior to ~Q3 2025, but things have shifted meaningfully since then

Copilot has access to the latest models like Opus 4.6 in agentic mode as well. It's got certain quirks and I prefer a TUI myself but it isn't radically different.

Even at Microsoft they're using Claude Code over Copilot, so I think it's different enough.

You are so behind the curve if you think copilot is mostly rubbish. That's a 4+ month old take.

I just don't use any Microsoft software anymore, thankfully

There are different kinds of innovation.

I want AI that cures cancer and solves climate change. Instead we got AI that lets you plagiarize GPL code, does your homework for you, and roleplay your antisocial horny waifu fantasies.

Hard problems take more time than easy problems

Of course, but at least DeepMind is taking a crack at the important problems

> that hasn't been solved before a bunch of times in publicly available code?

And this matters because? Most devs are not working on novel never before seen problems.

Heh, I agree. There is a vast ocean of dev work that is just "upgrade criticalLib to v2.0" or adding support for a new field from the FE through to the BE.

I can name a few times where I worked on something that you could consider groundbreaking (for some values of groundbreaking), and even that was usually more the combination of small pieces of work or existing ideas.

As maybe a more poignant example- I used to do a lot of on-campus recruiting when I worked in HFT, and I think I disappointed a lot of people when I told them my day to day was pretty mundane and consisted of banging out Jiras, usually to support new exchanges, and/or securities we hadn't traded previously. 3% excitement, 97% unit tests and covering corner cases.

I'm not sure if you'd call it a productivity gain, but I have to host our infrastructure on a system that runs processes entirely in Linux userland.

To bridge the containers in userland only, without root, I had to build: https://github.com/puzed/wrapguard

I'm sure it's not perfect, and I'm sure there are lots of performance/productivity gains that can be made, but it's allowed us to connect our CDN based containers (which don't have root) across multiple regions, talking to each other on the same Wireguard network.

No product existed that I could find to do this (at least none I could find), and I could never build this (within the timeframe) without the help of AI.

Well, it took opus 4.5 five messages to solve a trivial git problem for me. It hallucinated nonexistent flags three times. Hallucinating nonexistent flags is certainly a novel solution to my git ineptness.

Not to be outdone, chatgpt 5.2 thinking high only needed about 8 iterations to get a mostly-working ffmpeg conversion script for bash. It took another 5 messages to translate it to run in windows, on powershell (models escaping newlines on windows properly will be pretty nuch AGI, as far as I’m concerned).

You've got to be doing something wrong IMO. Mind sharing your system prompt and prompt/response pairs?

I know for a fact I deliver more and at higher quality and while being less tired. Mental energy is also a huge factor, because after digging in code for half a day i'd be exhausted.

People should stop focusing on vibecoding and realize how many things LLMs can do such as investigating messy codebases that took me ages of writing paper notes to connect the dots, finding information about dependencies just by giving them access to replacing painful googling and GitHub issues or outdated documentation digging, etc.

Hell I can jump in projects I know nothing about, copy paste a Jira ticket, investigate, have it write notes, ask questions and in two hours I'm ready to implement with very clear ideas about what's going on. That was multi day work till few years ago.

I can also have it investigate the task at hand and automatically find the many unknowns unknowns that as usual work tasks have, which means cutting deliveries and higher quality software. Getting feedback early is important.

LLMs are super useful even if you don't make them author a single line of code.

And yes, they are increasingly good at writing boilerplate if you have a nice and well documented codebase thus sparing you time. And in my career I've written tons of mostly boilerplate code, that was another api, another form, another table.

And no, this is not vibe coding. I review every single line, I use all of its failures to write better architectural and coding practices docs which further improves the output at each iteration.

Honestly I just don't get how people can miss the huge productivity bonus you get, even if you don't have it edit a singl line of code.

I work for a big tech company, most of our code today is written by agents. This includes backend infra and frontend app/UX code.

It satisfies your relevant criteria: LLM-written, reliable, non-trivial.

No major program is perfectly reliable so I wouldn't call it that (but we have fewer incidents vs human-written code), and "useful" is up to the reader (but our code is certainly useful to us.)

baffled that people are still suspicious of ai coding models

[dead]

the 100x gains, even 10x, are obviously ridiculous but that doesn't mean AI is useless

Yeah, I would LOVE to see attempts at significant video games that are then open-sourced for communities to work on. E.g. OpenGTA or OpenFIFA/OpenNHL.

I wouldn't say that anything before 11/2025 was a game changer, but after that, wow.

That said, I wouldn't expect there to be an innovative solution to an unsolved problem written by AI or humans that has been open sourced within the past 3 months.

Can you point me to a human written program an LLM cannot write? And no, just answering with a massively large codebase does not count because this issue is temporary.

Some people just hate progress.

> Can you point me to a human written program an LLM cannot write?

Sure:

"The resulting compiler has nearly reached the limits of Opus’s abilities. I tried (hard!) to fix several of the above limitations but wasn’t fully successful. New features and bugfixes frequently broke existing functionality.

As one particularly challenging example, Opus was unable to implement a 16-bit x86 code generator needed to boot into 16-bit real mode. While the compiler can output correct 16-bit x86 via the 66/67 opcode prefixes, the resulting compiled output is over 60kb, far exceeding the 32k code limit enforced by Linux. Instead, Claude simply cheats here and calls out to GCC for this phase (This is only the case for x86. For ARM or RISC-V, Claude’s compiler can compile completely by itself.)"[1]

1. https://www.anthropic.com/engineering/building-c-compiler

Pretty much any software that people pay for? If LLMs could clone an app, why would anyone still pay good money for the original?

Even a normal website like landonorris.com. Try copying all those effects with AI.

Another example: Red Dead Redemption 2

Another one: Roller coaster tycoon

Another one: ShaderToy

I wish I could agree with you, but as a game dev, shader author, and occasional asm hacker, I still think AIs have demonstrated being perfectly capable of copying "those effects". It's been trained on them, of course.

You're not gonna one-shot RD2, but neither will a human. You can one-shot particles and shader passes though.

I didnt say one shot it, coding agents have been out for more than couple years and yet we cant point to single Good piece of software built by it.

"Good" is obviously subjective but this mentality is so interesting because at my big tech company most of our software today is written by agents.

From my perspective, comments like these read as people having their head stuck in the sand (no offense, I might be missing something.)

Show me whats been built by agents

"coding agents have been out for more than couple years"?????

Depends on what we categorize as a coding agent. Devin was released two years ago. Cursor was about the same, and it released agent mode around 1.5 years ago. Aider has been around even longer than that I think.

Why do you believe an LLM can't write these, just because they're 3D? If the assets are given (just as with a human game programmer, who has artists provide them the assets), then an LLM can write the code just the same.

What? People can easily get assets, thats not a even a problem in 2026. Roller coaster tycoon's assets were done by the programmer himself. If its so easy why haven't we seen actually complex pieces of software done in couple of weeks by LLM users?

Also try building any complex effects by prompting LLMs, you wont get any far, this is why all of the LLM coded websites look stupidly bland.

Not sure what you're confused about, I never said assets were hard to get, I just said that the LLM needs to be provided a folder of the assets for it to make use of them, it's not going to create them from scratch (at least not without great difficulty, because LLMs are capable of using and coding Three.js for example). I don't know the answer to your first question because I don't hang around in the 3D or game dev fields, I'm sure there are examples of vibe coded games however.

As to your second question, it is about prompting them correctly, for example [0]. Now I don't know about you but some of those sites especially after using the frontend skill look pretty good to me. If those look bland to you then I'm not really sure what you're expecting, keeping in mind that the example you showed with the graphics are not regular sites but more design oriented, and even still nothing stops LLMs from producing such sites.

[0] https://youtu.be/f2FnYRP5kC4

you have shown me 0 examples, I showed actual examples to the given question. Your answers have just been "AI can also do this" but gave no actual proof.

The examples are in the video I linked, as I said, if you don't bother to watch it then I'm not sure what to tell you. As I said for games I don't know and won't presume to search up some random vibe coded game if I don't have personal experience with how LLMs handle games, but for web development, the sites I've made and seen made look pretty good.

Edit: I found examples [0] of games too with generated assets as well. These are all one shot so I imagine with more prompting you can get a decent game all without coding anything yourself.

[0] https://www.youtube.com/watch?v=8brENzmq1pE

And some people clearly hate humans.

[dead]

I'm building an entire game on Unity using LLMs. It's an action RPG.

Is it just a game built with LLMs or are you leaning into the cheap content gen capabilities to make the game exceptionally deep/braod?

I'm building all of the systems with LLMs and using LLMs to fast track the creation of content such as storylines, characters, etc. All of the assets are mostly bought and created by me.

Sounds fun! Asset creation...at least in terms of story content, should be the one area where LLMs would really shine, especially if it can somehow extend into logic and gameplay. Couple that with the ways of generating art assets (hard with an LLM, but it can do something at least), that would be cool. I hope to see these games in the future, although they might be labelled as slop unless done really well.

Actually, LLM fiction writing is awfully bad. But it does help with ideas!

I'm trying my hardest to make it feel high quality instead of just slop.

No, but I have seen privately available code that matches this description.

[dead]

Personally, I’ve only been using a coding agent for a few months infrequently, so I have nothing to show for it. (It is not 100x productivity, that’s absurd.)

But I have plenty of examples of really atrocious human written code to show you! TheDailyWtf has been documenting the phenomenon for decades.

Great question, here is the link from the future:

Yeah, Claude Code.

> single useful ... preferably reliable, program that solves a non-trivial problem that hasn't been solved before a bunch of times in publicly available code

I see this originality criteria appended a lot, and

1) I don't think it's representative of the actual requirements for something to be extremely useful and productivity-enhancing, even revolutionary, for programming. IDE features, testing, code generation, compilers — all of these things did not really directly help you produce more original solutions to original problems, and yet they were huge advances in program or productivity.

I mean like. How many such programs are there in general?

The vast vast majority of programs that are written are slight modifications, reorganizations, or extensions, of one or more programs that are already publicly available a bunch of times over.

Even the ones that aren't could fairly easily be considered just recombinations of different pieces of programs that have been written and are publicly available dozens or more times over, just different parts of them combined in a different order.

Hell, most code is a reorganization or recombination of the exact same types of patterns just in a different way corresponding to different business logic or algorithms, if you want to push it that far.

And yet plenty of deeply unoriginal programs are very useful and fill a useful niche, so they get written anyway.

2) Nor is it a particularly satisfiable goal. If there aren't, as a percentage, very many reliable, useful, and original programs that have been written in the decades since open source became a thing, why would we expect a five-year-old technology to have done so, especially when, obviously, the more reliable original and broadly useful programs have already written, the narrower the scope for new ones to satisfy the originality criteria?

3) Nor is it actually something that we would expect even under the hypothesis that agents make people significantly more productive at programs. Even if agents give 100x productivity gains to writing a useful tool or service or program or improving existing ones with new features. We still wouldn't expect them to give necessarily very many much productivity gains at all to writing original programs, precisely because of their current technology is a product of deep thinking, understanding a specific domain, seeing a niche, inspiration, science, talent and luck much more than the ability to even do productive engineering.

Not 100x but absolutely 4x to 5x increase in productivity for everyone on team on a large enterprise codebase that serves the military a lot of serious clients.

To deny at least that level of productivity at this point, you have to have your head in the sand.