I wanted to test how far AI coding tools could take a production project. Not a prototype. A social media management platform with 12 first-party API integrations, multi-tenant auth, encrypted credential storage, background job processing, approval workflows, and a unified inbox. The scope would normally keep a solo developer busy for the better part of a year. I shipped it in 3 weeks.

Before writing any code, I spent time on detailed specs, an architecture doc, and a style guide. All public: https://github.com/brightbeanxyz/brightbean-studio/tree/main...

I broke the specs into tasks that could run in parallel across multiple agents versus tasks with dependencies that had to merge first. This planning step was the whole game. Without it, the agents produce a mess.

I used Opus 4.6 (Claude Code) for planning and building the first pass of backend and UI. Opus holds large context better and makes architectural decisions across files more reliably. Then I used Codex 5.3 to challenge every implementation, surface security issues, and catch bugs. Token spend was roughly even between the two.

Where AI coding worked well: Django models, views, serializers, standard CRUD. Provider modules for well-documented APIs like Facebook and LinkedIn. Tailwind layouts and HTMX interactions. Test generation. Cross-file refactoring, where Opus was particularly good at cascading changes across models, views, and templates when I restructured the permission system.

Where it fell apart: TikTok's Content Posting API has poor docs and an unusual two-step upload flow. Both tools generated wrong code confidently, over and over. Multi-tenant permission logic produced code that worked for a single workspace but leaked data across tenants in multi-workspace setups. These bugs passed tests, which is what made them dangerous. OAuth edge cases like token refresh, revoked permissions, and platform-specific error codes all needed manual work. Happy path was fine, defensive code was not. Background task orchestration (retry logic, rate-limit backoff, error handling) also required writing by hand.

One thing I underestimated: Without dedicated UI designs, getting a consistent UX was brutal. All the functionality was there, but screens were unintuitive and some flows weren't reachable through the UI at all. 80% of features worked in 20% of the time. The remaining 80% went to polish and making the experience actually usable.

The project is open source under AGPL-3.0. 12 platform integrations, all first-party APIs. Django 5.x + HTMX + Alpine.js + Tailwind CSS 4 + PostgreSQL. No Redis. Docker Compose deploy, 4 containers.

Ask me anything about the spec-driven approach, platform API quirks, or how I split work between the two models.

How much of the specs themselves came from the LLM? The development schedule https://github.com/brightbeanxyz/brightbean-studio/blob/main... has very AI-looking estimates for exampl and I can see a commit in the architecture.md file which is exclusively changing em-dashes to normal dashes (https://github.com/brightbeanxyz/brightbean-studio/commit/74...) which suggests you wanted to make it seem less LLM-generated?

I ask, not to condemn, but to find out what your process was for developing the requirements. Clearly it was done with LLM help but what was the refinement process?

The spec document was also written by Claude (over many iteration) and lots of manual additions. It took me tho 4 full days to get the specs to the level I was happy with.

One main thing I did was to use the deep research feature of Claude to get a good understanding of what other tools are offering (features, integrations etc.)

Then each feature in the specs document got refined with manual suggestions and screenshots of other tools that I took.

Thanks for sharing!

> Before writing any code, I spent time on detailed specs, an architecture doc, and a style guide. All public: https://github.com/brightbeanxyz/brightbean-studio/tree/main...

> It took me tho 4 full days to get the specs to the level I was happy with.

When I click on history there I see only a single commit for these docs. Would you be willing to share some or all of the conversation you had with the LLM (in a gist or in the repo) that led to these architecture docs? Understand if you can't, but I'm sure it would be super instructive for people trying to understand the process of doing something like this and the types of guide rails that help to move the process forward productively.

I built something very, very similar for a client to post their content on schedule to about 9 different social networks. It was my first major vibe-coded app -- I normally vibe-code a function or small apps. Took about two hours with Claude [0] by just building up the functionality in layers, testing each layer as we went. If I'd rawdogged it, 2022-style, it would probably have taken me a month to write.

It's been running flawlessly for months without a single error. In fact, spooky good. I still feel nervous about it, though and check it every morning.

It is built mostly as a web app in .NET 10 Razor with SQLite db.

The APIs for the social networks are the hazy bit. As OP mentions, some are badly documented. Some are a pain to get access to. I was using a driven browser to post to Twitter, but they opened their API recently, which was nice.

[0] I used Claude in Github Copilot, so total cost was less than the $10/month in credits.

First, congrats on your accomplishment(s) and leveraging your AI+Python+WebDev talents.

Isn't this a SaaS-pocaplyse testament? What's stopping anyone from doing the same to BrightBean? What's stopping anyone with a little of domain knowledge and a $200+ Claude to clone your app and build yet another gap-filling, slightly improved content-syndication version and go-to-market? Is it worth taking it to the market when anyone can perpetuate the cycle?

I'm genuinely interested in knowing your thoughts.

> Isn't this a SaaS-pocaplyse testament? What's stopping anyone from doing the same to BrightBean?

It being open source doesn't help it either, so easy to malus/chardet it.

Thank you for this write up, this is much more interesting than all the "Show HN" that don't mention anything about AI but you can see it on every corner.

What you describe has also been my experience so far with building projects mostly with AI but with detailed specs but Rails instead of Django.

That was an interesting article. I have a few questions about the workflow.

1. You mentioned developing tasks in parallel—how many agents were you actually running at the same time? Did you ever reach a point where, even if you increased the degree of parallelism, merging and reviews became the bottleneck, and increasing the number further didn’t speed things up?

2. I really relate to the idea of “80% of features in 20% of the time, then 80% on polish.” Did you use AI for this final polishing phase as well? In other words, did you show the AI screenshots of the screens and explain them? Also, when looking back, do you feel that if you had written the initial specifications more carefully, you could have completed the work faster?

What I did was to break the development into different layers which had to be completed after another, since the functionalities build on each other. Each layer had independent work streams which run in parallel. Each work stream was one independent worktree/session in Claude code

First I triggered all work streams per layer and brought them to a level of completion I was happy with. Then you merge one after another (challenge in github with the @codex the implementation and rebases when you move to the next work stream.

This is roughly how it looked like:

Layer 0 - Project Scaffolding

Layer 1 — Core Features Stream A — Content Pipeline Stream B — Social Platform Providers Stream C — Media Library Stream D — Notification System Stream E — Settings UI

                        T-0.1 (Scaffolding)
                              │
                        T-0.2 (Core Models + Auth)
                              │
          ┌───────────────────┼───────────────────┬──────────────┐
          │                   │                   │              │
     Stream A            Stream B            Stream C       Stream D
     (Content)           (Providers)         (Media)        (Notifs)
          │                   │                   │              │
     T-1A.1 Composer    T-1B.1 FB/IG/LI    T-1C.1 Library  T-1D.1 Engine
          │              T-1B.2 Others           │              │
     T-1A.2 Calendar         │                   │         Stream E
          │                  │                   │         T-1E.1 Settings UI
     T-1A.3 Publisher ◄──────┘                   │
          │                                      │
          └──────────◄───────────────────────────┘
          (Publisher needs providers + media processing)

Layer 2 — Collaboration & Engagement Stream F — Approval & Client Portal Stream G — Inbox Stream H — Calendar & Composer Enhancements Stream I — Client Onboarding

          Layer 1 complete
                │
    ┌───────────┼───────────┬──────────────┐
    │           │           │              │
 Stream F   Stream G    Stream H       Stream I
 (Approval  (Inbox)     (Calendar+     (Onboarding)
  + Portal)              Composer
    │                    enhance)
 T-2F.1 Approval
    │
 T-2F.2 Portal
Thus I did run up to 4 agents in parallel, but o be honest this is the max level of parallelism my brain was able to handle, I really felt like the bottleneck here.

Additionally, your token usage is very high since you are having so many agent do work at the same time, hence I very often reached my claude session token limits and had to wait for the next session to begin (I do have the 5x Max plan)

[dead]

This is amazing. I started doing the same, but I did not have the time to polish it.

Questions: why no X? Do you have a feature to resize (summarize?) to the text to fit into short boxes?

Maybe because of the API access issues? I built almost the same system as OP, but I had to drive a browser to post to X because the API was too expensive for a single person to afford. A few weeks ago they switched it so it's pennies a post now and I could finally integrate the API.

What did your harness look like for this?

How much do it cost in token?

I built something very, very similar using Github Copilot and the whole thing cost me less than the $10/mo I spend because I know I still had credits left after I was done.

This is interesting, how do you publish to LinkedIn? I thought they didn't allow automated posts.

Seems to just use the website api: https://github.com/brightbeanxyz/brightbean-studio/blob/main...

Very helpful, thanks!

Why postgre instead of classic mysql?

MySQL does not let you have transactional DDL statements (alter, create, index etc).

If you're building anything serious and your data integrity is important, use Postgres.

Postgres is much stricter, and always was. MySQL tried to introduce several strict modes to mitigate the problems that they had, but I would always recommend to use Postgres.

such apps should use sqlite. it's enough for this type of app.

Why mysql instead of postgres should be the right question nowadays.

MySQL or Postgres are the DB of choice if you want a managed database in the cloud.

Probably Postgres is there because you can use it as a queue (https://livebook.manning.com/book/just-use-postgres/chapter-...)

Postgres isn't a newcomer any more. For most projects that I see it's the default and the "classic" already.

Postgres is simply a battle proven technology.

[dead]

Nothing wrong here, but Django/HTMX seem quite 'old' technologies to me for a new project made in 2026. Nowadays I use FastAPI/SQLAlchemy for the backend and SvelteKit on the frontend.

You don’t need a Drillator-X 3000 AI Ready™ if a simple screwdriver gets the job done. IMHO the main thing technical people get wrong about B2B problems.

Also calling HTMX old makes me feel old.

> You don’t need a Drillator-X 3000 AI Ready™ if a simple screwdriver gets the job done

Yes, and Django feels like more of a Drillator than a simple screwdriver to me.

yeah htmx is from 2020, it feels like yesterday

SvelteKit is also from 2020.

I do have originally a data science background, thus python is usually my go to language, and have a lot of experience with django already. This helps a lot when reviewing AI code and if you have to judge architecture, etc.

And for hmtx I simply wanted to have something lightweight that is not very invasive to keep things simple and dependencies low.

In my head this was a good consideration to keep complexity low for my AI agents :-)

Sure there is nothing wrong here, I was just talking about a feeling. HTMX is quite recent, but this idea of embedding logic into HTML reminds me of the old jQuery days.

> Django/HTMX seem quite 'old' technologies to me for a new project made in 2026.

It's simple, it works, it's efficient, safe, and there are tons of online resources for it. Excellent choice, even more so when using a coding agent.

FastAPI is quite old (2018)

Svelte even older (2016, SvelteKit was just an new version in 2022)

SQLAlchemy is ancient (2006)

Use newer tech, like HTMX (2020)

(/s obviously)

HTMX is 5 years old, version 2 is just under 2 years old, and the last release (2.0.7) came out 7 months ago.