Didn't realise this was some historic evil script and not some active attacker who could change tack at any moment.

That makes the fix pretty easy. Write a regex to detect the evil script, and revert every page to a historic version without the script.

Letting ancient evil code run? Have we learned nothing from A Fire Upon the Deep?!

Legitimately listening to this book for the first time after a coworker recommended it. It's rapidly becoming one of my favorite books that balances the truly alien with the familiar just right.

Not so ironically, it came up when we were discussing "software archeology".

"It was really just humans playing with an old library. It should be safe, using their own automation, clean and benign.

This library wasn't a living creature, or even possessed of automation (which here might mean something more, far more, than human)."

\(^O^)/ zones of thought mentioned \(^O^)/

Link to the Prologue of Fire Upon the Deep: https://www.baen.com/Chapters/-0812515285/A_Fire_Upon_the_De...

It's very short and from one of my favorite books. Increasingly relevant.

I've only just heard of it. But, I already knew to not run random scripts under a privileged account. And thank you for the book suggestion - I'm into those kinds of tales.

I love that book

Are you sure? Are you $150 million ARR sure? Are you $150 million ARR, you'd really like to keep your job, you're not going to accidentally leave a hole or blow up something else, sure?

I agree, mostly, but I'm also really glad I don't have to put out this fire. Cheering them on from the sidelines, though!

Or just restore from backup across the board. Assuming they do their backups well this shouldn't be too hard (especially since its currently in Read Only mode which means no new updates)

True but it does say something that such a script was able to lie dormant for so long.

Why would anyone test in production???!!!

Selecting the wrong environment in your test setup by mistake?

I refuse to believe that someone on the security team intentionally tested random user scripts in production on purpose.

Once you get big enough… there comes a point where you need to run some code and learn what happens when 100 million people hitting it at once looks like. At that scale, “1 in a million class bugs/race conditions” literally happen every day. You can’t do that on every PR, so you ship it and prepare to roll back if anything even starts to look fishy. Maybe even just roll it out gradually.

At least, that’s how it worked at literally every big company I worked at so far. The only reason to hold it back is during testing/review. Once enough humans look at it, you release and watch metrics like a hawk.

And yeah, many features were released this way, often gated behind feature flags to control roll out. When I refactored our email system that sent over a billion notifications a month, it was nerve wracking. You can’t unsend an email and it would likely be hundreds of millions sent before we noticed a problem at scale.

I would say you can get to this point far below 100 million people, especially on web. Some people are truly special and have some kind of setup you just can't easily reproduce. But I agree, you do really have to be confident in your ability to control rollout / blast radius, monitor and revert if needed.

> I refuse to believe that someone on the security team intentionally tested random user scripts in production on purpose.

Do I have a bridge to sell you, oh boy

There are plenty of ways to safely test in production. For one thing you need to limit the scope of your changes.

I have never heard of this kind of insane behaviour before.

"Everyone has a test environment. Some are lucky enough to have a separate production environment."