I’m shocked at the 25M line part! That is a completely unfathomable amount of code for one codebase. I really want to know more about that.

I am more shocked by the "overnight" aspect. I tried running clang-format on the Chromium source (68,281 .cc files, 21 million lines according to wc):

$ find chromium-149.0.7826.1/ -name ".cc" -exec cat {} + | wc 21640925 55715244 833460441

And that took less than 6 minutes on a single E5-2696 v3 from 2014:

$ time find chromium-149.0.7826.1/ -name *.cc | parallel -j 16 clang-format $x>/dev/null

real 0m5.666s user 1m13.964s sys 0m13.373s

That’s orders of magnitude faster, especially if we assume they’re not running their workloads on potatoes like mine. Is Ruby’s syntax really that much more complicated than C++, or is this a tooling problem?

I don't think the post necessarily means it took multiple hours to format the codebase, I think they're probably just saying they worked on it off-hours and landed it while no one was working so that it didn't run into merge conflicts.

My guess would be tooling. I think the Ruby formatters are written in Ruby. I’d guess the clang one is written in C.

Nah the article says it's rust and calling into a C library for parsing.

Only 25 million? :) Google had billions a decade ago...

https://research.google/pubs/why-google-stores-billions-of-l...

iirc they also vendor(ed) many of their dependencies, several layers deep, which still counts for "stores" though it's rather different than "wrote" / "maintains".

Very true. It was still hundreds of millions of lines of first party code a decade ago, and could easily be over a billion at this point.

Yeah, I can definitely believe that Google would break over a billion handwritten. It's a big company that has been around for a long time.

It's still absurd. But believable.

Right, where is the rest of the code?

They're up to 42 million now, as per the article

That sounds even more insane to me, but I guess most of that code does not really touch financial transactions, otherwise it would be a nightmare being responsible to verify that.

Ruby code touches financial transactions. Card payments were migrated to Java when I left in 2022. Non-card payments (e.g., ACH, checks, various wallets) were still processed by Ruby.

PCI-related/vaulting code lived in its own locked-down repo. I think that was a mix of Go and Ruby.

Once you have the foundations in place for account balances and the ledger, processing a payment isn’t that daunting. Those foundations, however, took a lot to build and evolve.

> Once you have the foundations in place for account balances and the ledger, processing a payment isn’t that daunting. Those foundations, however, took a lot to build and evolve.

Pretty much. I've worked at places with PHP payment processing that worked just fine, and at a place with C++ payment processing (and no testers) and it worked just fine. I wasn't around when the systems were first built though so not sure if there were tears along the way.

> migrated to Java

I want to know more about this

My (much smaller than Stripe) company is well over 4.5M at this point, and the graph is very much exponential.

AI has been a huge problem here: the amount of code is just exploding. Quality of the produced code is another matter.

^^^^^^^^^^^^^^^^^^^

I recently wrote a very esoteric Python script. 100 lines of code. No classes, no functions, but yes argparse.

I've tried out the latest open source models on the task. They go bananas. It's like Enterprise fizzbuzz (https://github.com/enterprisequalitycoding/fizzbuzzenterpris...). They love classes and imports and reinventing the wheel. A great way for me to tell trash AI slop code is it'll define a useful constant then 15 lines later do it again with a different name.

They love making code that looks impressive. "Wow look at all the classes and functions. It's so scalable. It's so dynamic. It validates every minutae against multiple schema and solves a problem I never thought about." But it was trash code. One really was 400 lines and it didn't even look like it would work. Can't even imagine what it means for 4.5M moderately good human lines to become what? 27M fluffy filler repeat lines that don't even make sense?

The bad part of LLM is it got trained on bad examples because us humans also don't know WTF we're doing.

Yeah maybe I need to do the old "you are a veteran engineer" nonsense. I've had some success telling it to implement everything it suggests and be production ready. I hate when it takes a shortcut and says I'll have to change it. That's kinda the whole point of me not writing the code...

Unless I’m mistaken, it’s a monorepo. So it’s not 25M LoC in a single app, it’s (all?) of their server-side code and shared libraries. There’s also a variety of other languages in use.

16 years and thousands of engineers write a lot of code.

Imagine lots and lots of models and stubs generated from swagger, protobuf, sqlc etc.