Interesting approach. How does it handle monorepos or repos with unconventional structure? The multi-pass analysis sounds nice, but I’d be curious how consistent the output is across runs if you run it twice on the same repo, do you get basically the same diagram?

[dead]

How does it handle messy massive codebases?

It uses a file tier system to prioritize what to analyze. Entry points, configs, and core source files get fetched fully. Tests and utilities get partial treatment. Generated code, lockfiles, and assets get skipped entirely. So even for large repos it focuses on the stuff that actually matters for understanding architecture.

For really massive repos (100K+ files) the analysis runs in a resumable pipeline - each of the 5 passes saves results to the database, so if the serverless function times out it picks up where it left off on the next connection. Embeddings for chat are also done incrementally in batches of 50 chunks.

That said, messy codebases are honestly where it's most useful. Clean well-documented repos don't need a tool like this. The ones with zero docs and 500 files with no clear structure are where it saves the most time.