I'm really enjoying reading over the prompts used for development: (https://github.com/maciej-trebacz/tower-of-time-game/blob/ma...)

A lot of posts about "vibe coding success stories" would have you believe that with the right mix of MCPs, some complex claude code orchestration flow that uses 20 agents in parallel, and a bunch of LLM-generated rules files you can one-shot a game like this with the prompt "create a tower defense game where you rewind time. No security holes. No bugs."

But the prompts used for this project match my experience of what works best with AI-coding: a strong and thorough idea of what you want, broken up into hundreds of smaller problems, with specific architectural steers on the really critical pieces.

> what works best with AI-coding: a strong and thorough idea of what you want, broken up into hundreds of smaller problems, with specific architectural steers on the really critical pieces

As a tech lead who also wears product owner hats sometimes: This is how you should do it with humans also. At least 70% of my job is translating an executive’s “Time travel tower game. No bugs” into that long series of prompts with a strong architectural vision that people can work on as a team with the right levels of abstraction to avoid stepping on each other’s toes.

I tried to build a simple static HTML game for the board game Just One, where you get a text box, type a word in, and it's shown full screen on the phone. There's a bug where, when you type, the text box jumps around, and none of the four LLMs I tried managed to fix it, no matter how much I prompted them. I don't know how you guys manage to one-shot entire games when I can't even stop a text box from jumping around the screen :(

Browser text entry on mobile phones is notoriously hard to get right and some bugs are literally unfixable [1]. I'm a frontend developer in my day job and I struggled with this even before AI was a thing. I think you just accidentally picked one of the hardest tasks for the AI to do for you.

[1] Example: https://www.reddit.com/r/webdev/comments/xaksu6/on_ios_safar...

Huh, that's actually my exact bug. I didn't realize this was so hard, thank you.

I have a reasonably good solution for this project of mine you might find useful:

https://grack.com/demos/adventure/

The trick for me was just using a hidden input and updating the state of an in game input box. The code is ancient by today's standards but uses a reasonably simple technique to get the selection bounds of the text.

It works with auto complete on phones and has been stable for a decade.

hidden input box is something I heard before from some hacker-ish old collegues - seems to be a powerful and reliable approach to store state & enable communication between components!

Oops, I worded my comment poorly -- it's not a hidden input, but rather a "CSS-visibility-hidden textbox input". Hidden inputs are useful but something completely different.

Gotcha, thank you for the clarification!

That's promising, thank you! I'll ask the LLM to implement it.

https://xkcd.com/1425/

One of the frustrating things about web dev, I find, is the staggering gulf between apparently nearly identical tasks and unpredictability of it. So often I will find myself on gwernnet asking Said Achmiz, 'this letter is a little too far left in Safari, can we fix that?' and the answer is 'yes but fixing that would require shipping our own browser in a virtual machine.' ¯\_(ツ)_/¯

[deleted]

> what works best with AI-coding: a strong and thorough idea of what you want, broken up into hundreds of smaller problems, with specific architectural steers on the really critical pieces

This has worked extremely well for me.

I have been working on an end-to-end modeling solution for my day job and I'm doing it entirely w/Claude.

I am on full-rework iteration three, learning as I go on what works best, and this is definitely the way. I'm going to be making a presentation to my team about how to use AI to accelerate and extend their day-to-day for things like this and here's my general outline:

1. Tell the LLM your overall goal and have it craft a thoughtful product plan from start to finish.

2. Take that plan and tell it to break each of the parts into many different parts that are well-planned and thoroughly documented, and then tell it to give you a plan on how to best execute it with LLMs.

3. Then go piece by piece, refining as you go.

The tool sets up an environment, gets the data from the warehouse, models it, and visualizes it in great detail. It took me about 22 hours of total time and roughly 2 hours of active time.

It's beautiful, fast, and fully featured. I am honestly BLOWN AWAY by what it did and I can't wait to see what others on my team do w/this. We could have all done the setup, data ingestion, and modeling, no question; the visualization platform it built for me we absolutely could NOT have done w/the expertise we have on staff--but the time it took? The first three pieces probably were a few days of time, but the last part, I have no idea. Weeks? Months?

Amazing.

I wrote a whole PRD for this very simple idea, but still the bug persisted, even though I started from scratch four times. Granted, some had different bugs.

I guess sometimes I have to do some minor debugging myself. But I really haven't encountered what you're experiencing.

Early on, I realized that you have to start a new "chat" after so many messages or the LLM will become incoherent. I've found that gpt-4.1 has a much lower threshold for this than o3. Maybe that's affecting your workflow and you're not realizing it?

No, that's why I started again, because it's a fairly simple problem and I was worried that the context would get saturated. A sibling commenter said that browser rendering bugs on mobile are just too hard, which seems to be the case here.

Have you tried with both Claude opus 4 and Gemini 2.5 pro?

Opus 4, Sonnet 4, o3, o4-mini-high.

Same. I had some idea that I wanted to build a basic sinatra webapp with a couple features. First version was pretty good. Then I asked it to use tailwind for the css. Again pretty good. Then I said I wanted to use htmx to load content dynamically. Suddenly it decides every backend method needs to check if the call is from htmx and alter what it does based on that. No amount of prompting could get it to fix it.

Hard to tell what exactly went wrong in your case, but if I were to guess - were you trying to do all of this in a single LLM/agent conversation? If you'll look at my prompt history for the game from OP you'll see it was created with a dozens of separate conversations. This is crucial for non-trivial projects, otherwise the agent will run out of context and start to hallucinate.

Agent mode in RubyMine which I think is using a recent version of sonnet. I tried starting a new agent conversation but it was still off quite a bit. For me my interest in finessing the LLM runs out pretty quickly, especially if I see it moving further and further from the mark. I guess I can see why some people prefer to interact with the LLM more than the code, but I’m the opposite. My goal is to build something. If I can do in 2 hours of prompting or 2 hours of doing it manually I’d rather just do it manually. It’s a bit like using a mirror to button your shirt. I’d prefer to just look down.

> If I can do in 2 hours of prompting or 2 hours of doing it manually I’d rather just do it manually.

100% agree, if that was the case I would not use LLMs either. Point is, at least for my use case and using my workflow it's more like 2 hours vs 10 minutes which suddenly changes the whole equation for me.

Yeah, or 10 minutes of prompting and then 20 minutes of implementing my own flavor of the LLM's solution vs 2 hours of trial and error because I'm usually too lazy to come up with a plan.

CSS is the devil and I fully admit to burning many hours of dev time, mine without an LLM, an LLM by itself, and a combination of the two together to iron out similar layout nonsense for a game I was helping a friend with. In the end, what solved it was breaking things into hierarchical react components and adding divs by hand and using the chrome dev tools inspector, and good old fashioned human brain power to solve it. The other one was translating a python script to rust. I let the LLM run me around in circles, but what finally did it was using Google to find a different library to use, and then to tell the LLM to use that library instead.

I didn't realize this was so hard, thanks. I expected to be simple positioning issues, but the LLMs all found it impossible.

Here's the game, BTW (requires multiple people in the same location): https://home.stavros.io/justone/

[deleted]

What I've found works best is to hand-code the first feature, rendering the codebase itself effectively a self-documenting entity. Then you can vibe code the rest.

All future features will have enough patterns defined from the first one (schema, folder structure, modules, views, components, etc), that very few explicit vibe coding rules need to be defined.

>a strong and thorough idea of what you want, broken up into hundreds of smaller problems, with specific architectural steers on the really critical pieces.

Serious question: at what point is it easier to just write the code?

Depends. If you have written other Tower Defense games then it’s probably really close to that line. If you just took a CS class in high school then this vibe approach is probably 20x faster.

My aunt would always tell me that making fresh pasta or grounding your own meat was basically just as fast as buying it. And while it may have have been true for her it definitely wasn’t for me.

And if it's a work project, you're going to spend a few years working on the same tech. So by the time you're done, there's going to be templates, snippets,... that you can quickly reuse for any prototyping with the tech. You would be faster by the fact that you know that it's correct and you don't have to review it. Helps greatly with mental load. I remember initializing a project in React by lifting whole modules out of an old one. Those modules could have been libraries the way they were coded.

All of this, and highlighting this part:

>You would be faster by the fact that you know that it's correct and you don't have to review it. Helps greatly with mental load.

I keep thinking maybe it's me who's just not getting the vibe coding hype. Or maybe my writing vs reading code efficiency is skewed towards writing more than most people's. Because the idea of validating and fixing code vs just writing it doesn't feel efficient or quality-oriented.

Then, there's the idea that it will suddenly break code that previously worked.

Overall, I keep hearing people advocating for providing the AI more details, new approaches/processes/etc. to try to get the right output. It makes me wonder if things might be coming full circle. I mean, there has to be some point where it's better to just write the code and be done with it.

> what works best with AI-coding: a strong and thorough idea of what you want, broken up into hundreds of smaller problems

A technique that works well for me is to get the AI to one-shot the basic functionality or gameplay, and then build on top of that with many iterations.

The one-shot should be immediately impressive, if not then ditch it and try again with an amended prompt until you get something good to build on.

I totally agree!

this is the idea behind my recent post actually[1] where I recommend people use AI to write specs before they code. If all you have to do is a human is edit the spec, not write it from scratch, you're more likely to actually make one.

[1] https://lukebechtel.com/blog/vibe-speccing

Heh, didn't know there was a name for it...

What I've taken to lately is getting the robots to write "scientific papers" on what I want them to get up to so instead of iterating over broken code I can just ask them "does this change follow the specification?" Seems to stop them from doing overly stupid things...mostly.

Plus, since what I've been working on is just a mash-up of other people's ideas, it provides a good theoretical foundation of how all the different bits fit together. Just give them the paper you've been working on and some other paper and ask how the two can be used together, a lot of the time the two ideas aren't compatible so it saves a lot of time trying to force two thing to work when they really shouldn't. Very good way to explore different ideas without the robots going all crazy and producing a full code project (complete with test and build suites) instead of just giving a simple answer.

there is now I suppose! ;)

Yeah it isn't a panacea but it has afforded me less frustration than the alternative of jumping straight in.

> Since what I've been working on is just a mash-up of other people's ideas

Totally, I find most work I do, if I'm honest, is in this bucket. LLMs are pretty good at "filling in the gaps" between two ideas like this

Coincidentally those seem to be strongly correlated with success in old fashioned application development as well.

> No security holes. No bugs.

You forgot “Don’t hallucinate.” Noob.

> No security holes. No bugs.

A friend called me for advice on trouble he was having with an LLM and I asked “What exactly do you want the LLM to do?” He said “I want it to knock this project out of the park.” And I had to explain to him it doesn’t work that way. You can’t just ask for perfection.

I mean, you can, but you won’t get it.

Writing tests and/or PRDs helps. Gives llms tangible direction that can be quantified.

> A lot of posts about "vibe coding success stories"

Where are you reading “a lot of posts” making this specific claim? I’ve never seen any serious person make such a claim

> a strong and thorough idea of what you want, broken up into hundreds of smaller problems, with specific architectural steers on the really critical pieces.

This is how I’ve been using LLM bots since CGPT preview and it’s been phenomenally useful and 100x my productivity

The gap seems to be between people who never knew how to build, looking for a perfect Oracle that would be like a genie in a lamp, then mad when its actual work

The thing the last few years have beat into me is that most engineers are actually functionally bad engineers who only know 1:1000th of what they should know in order to know how to build a successful project end to end

My assumption was that all of the bad engineers I worked with in person were a accidental sample of some larger group of really good ones (who I’ve also been able to work with over the years) and that it’s just rare to find an actual capable engineer who understands the whole process

Turns out that’s a trivial minority (like every other field) and most people are pretty bad at what they do

I see 100x used quite a bit related to LLM productivity. It seems extreme because it implies one could generate a year’s worth of value in a few days. I would think delivering features involves too much non coding work for this to be possible.

But that’s precisely what I’m saying is that what I can do today by myself in a couple of days would have taken me a year with a team of three people

The key limiting factor to any project as somebody else in this thread said was “people alignment are the number one hindrance in project speed”

So 10 years ago if I wanted to make a web application that does complex shit I’d have to go and hire a handful of experts have them coordinate, manage the coordination of it, deliver it, monitor it everything else all the way through ideation storyboarding and everything else

I can do 100% of that myself now, now it’s true I could’ve done 100% of myself previously, but again it took a year of side effort to do it

If 100x was really possible, it would be instantly, undeniably obvious to everyone. There would be no need for people alignment because one lone developer could crank out basically anything less complicated than an OS in a month.

It is starting to become obvious to more and more people. And is it really that hard to believe that a tool can extend your natural abilities by 2 orders of magnitude but not everyone can instantly use it? If fact you’re using one right now. Your computer or phone can do many things orders of magnitude faster than you can do alone, but only until recently most people had no idea how to use computers and could not benefit from this power.

I believe with LLM’s were set to relive the same phenomenon again.

I use it at work everyday. I work with people who use it everyday. 100x is complete and utter nonsense.

100x means that I can finish something that would have taken me 10 years in a little over a month.

It would be obvious not because people are posting “I get a 100x productivity boost”, but because show HN would be filled with “look at this database engine I wrote in a month”, and “check out this OS that took me 2 months”.

And people at work would be posting new repos where they completely rewrote entire apps from the ground up to solve annoying tech debt issues.

You’re missing the point by bike shedding on “100x”

It’s probably higher tbh because there’s things I prototyped to test an assumption on, realized it was O(N^2) then dumped it and tried 4 more architecture simulations to get to one that was implementable with existing tool chains I know

So you’re doing exactly what i called out which is evaluating it as a magic oracle instead of what I said which is that it makes me personally something like 100x more productive as a support tool, which often means quickly ruling out bad ideas

Preventing a problem in architecture is worth way more than 100x

If what you meant by 100x more productive is that sometimes for very some specific things it made you 100x more productive, and that isn’t applicable to software development in general, I can see that.

I have many times delivered a year of value in a few days by figuring out that we didn’t actually need to build something instead of just building exactly what someone asked for.

>I have many times delivered a year of value in a few days by figuring out that we didn’t actually need to build something instead of just building exactly what someone asked for.

Knowing what not to do more of a superpower than knowing what to do - cause it’s possible to know

You can prototype by hand too. Personally I find it might take me 10 min to try a change with an LLM that would have taken me 30 min to 1hr by hand. It's a very nice gain but given the other things to do that aren't sped up by LLM all that much (thinking about the options, communicating with the team), it's not _that_ crazy.

Sorry, I call bs, unless you were very poor developer without any skills to manage people.

[dead]

The bottleneck IME is people. It's almost never code. It's getting alignment, buy-in, everyone rowing in the same direction.

Tech that powers up an individual so they can go faster can be a bit of a liability for a company, bus factor 1 and all that.

100x is a bold statement.

You can easily get to 100x in a greenfield project but you will never get to 100x in a legacy codebase.

That depends on the code-base. I've found that hand-writing the first 50% of the code base actually makes adding new features somewhat easier because the context/shape of the idea is starting to come into focus. The LLM can take what exists and extrapolate on it.

> Where are you reading “a lot of posts” making this specific claim?

Reddit.