I tried to build a simple static HTML game for the board game Just One, where you get a text box, type a word in, and it's shown full screen on the phone. There's a bug where, when you type, the text box jumps around, and none of the four LLMs I tried managed to fix it, no matter how much I prompted them. I don't know how you guys manage to one-shot entire games when I can't even stop a text box from jumping around the screen :(
Browser text entry on mobile phones is notoriously hard to get right and some bugs are literally unfixable [1]. I'm a frontend developer in my day job and I struggled with this even before AI was a thing. I think you just accidentally picked one of the hardest tasks for the AI to do for you.
[1] Example: https://www.reddit.com/r/webdev/comments/xaksu6/on_ios_safar...
Huh, that's actually my exact bug. I didn't realize this was so hard, thank you.
I have a reasonably good solution for this project of mine you might find useful:
https://grack.com/demos/adventure/
The trick for me was just using a hidden input and updating the state of an in game input box. The code is ancient by today's standards but uses a reasonably simple technique to get the selection bounds of the text.
It works with auto complete on phones and has been stable for a decade.
hidden input box is something I heard before from some hacker-ish old collegues - seems to be a powerful and reliable approach to store state & enable communication between components!
Oops, I worded my comment poorly -- it's not a hidden input, but rather a "CSS-visibility-hidden textbox input". Hidden inputs are useful but something completely different.
Gotcha, thank you for the clarification!
That's promising, thank you! I'll ask the LLM to implement it.
https://xkcd.com/1425/
One of the frustrating things about web dev, I find, is the staggering gulf between apparently nearly identical tasks and unpredictability of it. So often I will find myself on gwernnet asking Said Achmiz, 'this letter is a little too far left in Safari, can we fix that?' and the answer is 'yes but fixing that would require shipping our own browser in a virtual machine.' ¯\_(ツ)_/¯
> what works best with AI-coding: a strong and thorough idea of what you want, broken up into hundreds of smaller problems, with specific architectural steers on the really critical pieces
This has worked extremely well for me.
I have been working on an end-to-end modeling solution for my day job and I'm doing it entirely w/Claude.
I am on full-rework iteration three, learning as I go on what works best, and this is definitely the way. I'm going to be making a presentation to my team about how to use AI to accelerate and extend their day-to-day for things like this and here's my general outline:
1. Tell the LLM your overall goal and have it craft a thoughtful product plan from start to finish.
2. Take that plan and tell it to break each of the parts into many different parts that are well-planned and thoroughly documented, and then tell it to give you a plan on how to best execute it with LLMs.
3. Then go piece by piece, refining as you go.
The tool sets up an environment, gets the data from the warehouse, models it, and visualizes it in great detail. It took me about 22 hours of total time and roughly 2 hours of active time.
It's beautiful, fast, and fully featured. I am honestly BLOWN AWAY by what it did and I can't wait to see what others on my team do w/this. We could have all done the setup, data ingestion, and modeling, no question; the visualization platform it built for me we absolutely could NOT have done w/the expertise we have on staff--but the time it took? The first three pieces probably were a few days of time, but the last part, I have no idea. Weeks? Months?
Amazing.
I wrote a whole PRD for this very simple idea, but still the bug persisted, even though I started from scratch four times. Granted, some had different bugs.
I guess sometimes I have to do some minor debugging myself. But I really haven't encountered what you're experiencing.
Early on, I realized that you have to start a new "chat" after so many messages or the LLM will become incoherent. I've found that gpt-4.1 has a much lower threshold for this than o3. Maybe that's affecting your workflow and you're not realizing it?
No, that's why I started again, because it's a fairly simple problem and I was worried that the context would get saturated. A sibling commenter said that browser rendering bugs on mobile are just too hard, which seems to be the case here.
Have you tried with both Claude opus 4 and Gemini 2.5 pro?
Opus 4, Sonnet 4, o3, o4-mini-high.
Same. I had some idea that I wanted to build a basic sinatra webapp with a couple features. First version was pretty good. Then I asked it to use tailwind for the css. Again pretty good. Then I said I wanted to use htmx to load content dynamically. Suddenly it decides every backend method needs to check if the call is from htmx and alter what it does based on that. No amount of prompting could get it to fix it.
Hard to tell what exactly went wrong in your case, but if I were to guess - were you trying to do all of this in a single LLM/agent conversation? If you'll look at my prompt history for the game from OP you'll see it was created with a dozens of separate conversations. This is crucial for non-trivial projects, otherwise the agent will run out of context and start to hallucinate.
Agent mode in RubyMine which I think is using a recent version of sonnet. I tried starting a new agent conversation but it was still off quite a bit. For me my interest in finessing the LLM runs out pretty quickly, especially if I see it moving further and further from the mark. I guess I can see why some people prefer to interact with the LLM more than the code, but I’m the opposite. My goal is to build something. If I can do in 2 hours of prompting or 2 hours of doing it manually I’d rather just do it manually. It’s a bit like using a mirror to button your shirt. I’d prefer to just look down.
> If I can do in 2 hours of prompting or 2 hours of doing it manually I’d rather just do it manually.
100% agree, if that was the case I would not use LLMs either. Point is, at least for my use case and using my workflow it's more like 2 hours vs 10 minutes which suddenly changes the whole equation for me.
Yeah, or 10 minutes of prompting and then 20 minutes of implementing my own flavor of the LLM's solution vs 2 hours of trial and error because I'm usually too lazy to come up with a plan.
CSS is the devil and I fully admit to burning many hours of dev time, mine without an LLM, an LLM by itself, and a combination of the two together to iron out similar layout nonsense for a game I was helping a friend with. In the end, what solved it was breaking things into hierarchical react components and adding divs by hand and using the chrome dev tools inspector, and good old fashioned human brain power to solve it. The other one was translating a python script to rust. I let the LLM run me around in circles, but what finally did it was using Google to find a different library to use, and then to tell the LLM to use that library instead.
I didn't realize this was so hard, thanks. I expected to be simple positioning issues, but the LLMs all found it impossible.
Here's the game, BTW (requires multiple people in the same location): https://home.stavros.io/justone/