It's been my experience that strongly opinionated frameworks are better for vibe coding regardless of the type system.

For example if you are using rails vibe coding is great because there is an MCP, there are published prompts, and there is basically only one way to do things in rails. You know how files are to be named, where they go, what format they should take etc.

Try the same thing in go and you end up with a very different result despite the fact that go has stronger typing. Both Claude and Gemini have struggled with one shotting simple apps in go but succeed with rails.

In comparison a completely unopinionated framework like fastapi, which got a popularity boost in the early a.i. surge, is a mess to work with if you are vibe coding. Most popular frameworks follow the principle of having no clear way how to do things and leave it up to the developer. Opinionated frameworks got out of fashion after rails but it turns out those are significantly better suited for a.i. assisted development.

You can opinionate Claude remarkably well with context files. I use a very barebones routing framework with my own architecture and Claude knows how all the parts should fit together. I also publish to context files the entire database structure along with foreign key pairings, that made a tremendous difference.

That's an interesting assertion you make there about opinionated frameworks. Do you have a source for that? From my perspective, opinionated frameworks have only gotten more popular. Rails might not be the darling of every startup in existence anymore but I think that's largely down to other languages coming in and adopting the best parts of Rails and crafting their own flavor that plays to the strengths of their favorite programming language. Django, Laravel, Spring Boot, Blazor, Phoenix, etc etc.

While a lot of people here on this platform like to tinker and are often jumping to a new thing, most of my colleagues have no such ideas of grandeur and just want something that works. Rails and it's acolytes work really well. I'm curious to know what popular frameworks you're referencing that don't fit into this Rails-like mold?

I'm not familiar with all frameworks you listed, but i've worked extensively with spring boot and i can assert you that it's not a opinionated framework (as in one way how to do things correctly). Blazor and Phoenix are niche frameworks that don't have wide adoption outside this site. Django has a shared history/competition with Rails but it's also not widely popular.

> We take an opinionated view of the Spring platform and third-party libraries so you can get started with minimum fuss

Spring Boot is definitely opinionated (this is taken from their home page). Maybe not as much as RoR, but saying it isn't at all sounds very strange to me, having worked with it for a few years too...

> Django has a shared history/competition with Rails but it's also not widely popular.

Are you sure? Django is insanely popular. I am not sure on what basis you are saying Django isn't popular. I posit Django is more popular than Ruby on Rails.

Django is super popular.

My experience has been the opposite with Rails because of open-ended patterns with Hotwire. Sure, Rails itself is opinionated but Hotwire provides multiple ways to do the same thing, which confuses LLMs. For example, recently I tried building a form that allows creating related objects inline using modals. Claude 4 Sonnet got quite confused by that request, no matter how much help I provided. It managed in the end but the solution left a lot to be desired for. It can build the same feature using React on it's own with basic instructions.

Same thing with other libraries like HTMX. Using TypeScript with React, and opinionated tools like Tanstack Query helps LLMs be way more productive because it can fix errors quickly by looking at type annotations, and using common patterns to build out user interactions.

I find Claude works extremely well at generating Stimulus controller code. Likely a lack of documentation and git repos with larger Hotwire codebase patterns that it was trained on.

This is pretty anecdotal, but it feels like most of the published rails source code you find online (and by extension, an LLM has found) is from large, stable, and well-documented code.

Claude code with rails is amazing. Should out to Obie for the Claude on rails. Works phenomenally well.

Basically it's like this:

the more constraints you have, the more freedom you have to "vibe" code

and if someone actually built AI for writing tests, catching bugs and iterating 24/7 then you'd have something even cooler

> if someone actually built AI for writing tests, catching bugs and iterating 24/7

This is called a nightly CI/CD pipeline.

Run a build and run all tests and run all coverage at midnight, failed/regressed tests and reduced coverage automatically are assigned to new tickets for managers to review and assign.

"Nightly?"

Iteration speed can never be faster than the testing cycle.

Unless you're building something massive (like Windows or Oracle maybe) nobody is relying on "nightly" integration tests anymore.

Post-merge tests, once a day?

Who does that, we are not in 90s anymore.

Run all the tests and coverage on every PR, block merge on it passing. If you think that's too slow then you need to fix your tests.

We go through maybe 10k CPU hours in our nightly pipeline. Doing that for every PR in a team of 70 people is unsustainable from a cost standpoint.

The existing tests aren't optimal, but it's not going to be possible to cut it by 1-2 orders of magnitude by "fixing the tests"

We obviously have smaller pre-merge tests as well.

> We obviously have smaller pre-merge tests as well.

This. I feel like trying to segregate tests into "unit" and "integration" tests (among other kinds) did a lot of damage in terms of prevalent testing setups.

Tests are either fast or slow. Fast ones should be run as often as possible, with really fast ones every few keystrokes (or on file save in the IDE/editor), normal fast ones on commit, and slow ones once a day (or however often you can afford, etc.). All these kinds of tests have value, so going without covering both fast and slow cases is risky. However, there's no need for the slow tests to interrupt day-to-day development.

I seem to remember seeing something like `<slowTest>` pragma in GToolkit test suites, so at least a few people seem to have had the same idea. The majority, however, remains fixated on unit/integration categorization and end up with (a select few) unit tests taking "1-2 orders of magnitude" too long, which actually diminishes the value of those tests since now they're run less often.

Pssht, so little? With AI you're supposed to have a huge data center and pay them thousands of dollars to process many, many tokens. That way you are doing it right, 24/7.

How else are we going to cover these costs? https://www.youtube.com/watch?v=cwGVa-6DxJM

Have you considered that instead, whatever LLM has the most examples of are what it's best at? Perhaps there's more well-structures Rails code in training than Go?

In my experience Gemini can one-shot go apps. Determining it requires sound eval instead of anecdotes.

I'd really like to know what type of apps you're actually one-shotting with an AI. Seriously, can you please give me some example code or something because it seems like anything past a trivial program that doesn't actually do what you specified is far beyond their capabilities.

I did a flask application that read an AWS account's glue resources, displayed them based on category (tag of "databasename" and "driver" etc) and offered the ability to run those jobs in serial or parallel, with a combined job status page for each batch. It also used company colours because I told it to pick a colour palette from the company website. It worked first time and produced sane, safe code.

There was a second shot, which was to add caching of job names because we have a few hundred now.

(Context: I'm at a company that has only ever done data via hitting a few hand replicated on prem databases at the moment and wanted to give twitchy folks an overview tool that was easy to use and look at)

if AI could really one-shot important, interesting apps, shouldn’t we be seeing them everywhere? where’s the surge of new apps that are so trivial to make? who’s hiding all this incredible innovation that can be so easily generated?

If AI could really accelerate or even take over the majority of work on an established codebase, we should be seeing a revolution in FOSS libraries and ecosystems. The gap has been noted many times, but so far all anyone's been able to dig up are one-off, laboriously-tended-to pull requests. No libraries or other projects with any actual downstream users.

It's taken over my mature codebase just fine. I'm not in the business of spending tokens on open source projects though.

But plenty of maintainers are in the business of spending mass amounts of time, energy, and actual money on open source projects. Some make a business out of it. Some are sponsored by their employer to spend paid work hours on FOSS projects. If LLMs could help them, some significant number would.

But if there are any instances of this, I have not seen them, and seemingly neither has anyone I've posed the question to, or any passersby.

How would you know? I don't label my changes that were made by AI.

Somebody would. Somebody would be an AI evangelist, or would become one. The FOSS ecosystem is large enough to be sure of that. We're not seeing nothing, we're just not seeing at all what the marketers and AInfluencers are prophesying. We're not even seeing what you describe. Why is that? Why is it limited to random commenters and not seen at all in the wild?

There is a Cloudflare project that published the entire AI generated history complete with prompts. And of course in many projects the majority of PRs are opened by dependabot, it's not an LLM but it's a "bot" at least.

I agree we're not seeing open source projects be entirely automated with LLMs yet. People still have to find issues, generate PRs (even if mostly automatic), open them, respond to comments, etc. It takes time and energy.

I've made another comment in this thread about a nice tool I one-shotted. The reason I don't publish anything now is because in the UK at least, companies are not behaving will with relation to IP: many contracts specify that anything you work on that can be expected of you in the course of your duties belongs to the company, and tribunals have upheld this.

There's also a bit of a stigma about vibe coding: career wise, personally I worry that sharing some of this work will diminish how people view me as an engineer. Who'd take the risk if there might be a naysayer on some future interview panel who will see CLAUDE.md in a repo of yours and assume you're incompetent or feckless?

Plus, worries about code: being an author gives you a much higher level of control than being an author-reviewer. To err as a writer is human, to err as a reader has bigger consequences.

My experience with Gemini has been pretty dismal. The CLI works much better than the VS code extension and both of them have struggled with one shotting go. Single files or single functions no problem though.

Weird, I thought Go was one of the go-to examples in HN for languages that LLMs work well with, precisely because it's opinionated and has many standard libs. Not that I've tried, my attempts at vibe coding felt disappointing, but I think this contradicts the zeitgeist?

I work in both ruby and go. There is no comparison ai is way better with ruby (rails).

Hmm I can imagine that while LLMs are good at producing working code in Go they might not be as good at structuring larger applications, compared to building on opinionated frameworks.

I imagine there could be some presets out there that guide the vibe-coding engines to produce a particular structure in other languages for better results.

Is that specific to using Rails or it is good with plain Ruby as well?

Rails, I haven't tried at all with plain ruby, but I doubt. I think formulaic = static typing for AI

Well yeah, it's like how a 5 year old can talk about what they want in their sandwich but will probably struggle to describe the flavours and textures they enjoy in detail.

I've been using flask and the results are remarkable. Remarkable to the point where I've one-shotted rather good things that I'm now using daily.

This isn't a fully formed thought, but could this be mitigated by giving LLMs your opinions? I am using copilot in more of a pair programming manner and for everything I want to make I give a lot of my opinions in the prompt. My changes are never too large though, a hundred lines of diff at most.

it sounds like you should have just been writing configuration this whole time?