Article published in the Summer 2001 edition of California Management Review, yet it never mentioned Y2K, the first thing I thought of when I read the line "fixing problems that never happened". Perhaps it was actually written in 1999 and took a while to get published, because otherwise that seems a very strange omission. The Y2K problem was very much over-hyped by the American news media at the time (no, at no point would airplanes have been falling out of the sky — I literally heard someone say that would happen once — even if no effort had been put into fixing the bug).
But in recent years I have seen people (elsewhere, not on HN) claim that Y2K was a big nothingburger, and all the money spent on fixing the bug was wasted. No, that's not true either. All the money spent on fixing the bug was why it turned into a big nothingburger. Sure, some of that money was wasted, by executives who wanted an "official" Y2K-certified certificate, issued by a consulting firm that had nothing "official" about it except their own say-so. And so they spent $2 million learning what their own employees could have told them for $2,000. THAT money was wasted. But a lot of banks were running old COBOL code that used 2-digit years, and needed to be fixed. The fact that in January 2000, everyone's bank interest was still calculated correctly, and not calculated as if it was January 1900? THAT was entirely due to the vast amounts of money spent paying old COBOL coders to come out of retirement and fix the 2-digit years.
The lesson I learned from that is that it's possible for a problem to be overhyped, even massively overhyped, and yet still be a serious problem. The other lesson I should have learned is that people rarely get credit (I won't go so far as the article authors and say "nobody ever gets credit") for fixing problems that never happened.
The problem is that a lot of people have a very binary view on life. Either something is a complete success or a complete waste of money, rarely do we accept that most projects fall somewhere in the middle.
The binary view is mostly true, unless it's for events or problems they are themselves familiar with. There is a term for this, but can't for the life of me remember it: People think the problems they are dealing with are infinitely more nuanced, complex and unique than the problems other people are dealing with.
And even worse, they don't think probability is a thing. If something happens, it was certain to happen and we just failed to predict it correctly.
So when someone predicts something will happen with a 90% probability, and then the 10% chances happens and the predicted event does not happen, people will talk about what a bad prediction that was and how they were clearly wrong.
It's the same logic that causes people to say vaccines don't work because they don't stop a disease with 100% effectiveness, or that there is no point to wear a seatbelt because people still die while wearing one.
My issue with this version of explaining the lack of severity of Y2K is that there were lots of countries that were being derided for not taking the issue seriously but did not seem to suffer any ill effects.
This is interesting, do you have any links?
A couple of possible confounding factors I can think of:
1. Plenty of countries use software developed elsewhere.
2. I suspect that the more recently you computerised your economy, the less likely it would be to have code vulnerable to Y2K.
It's also possible that in some places there were a few issues, but people looked at bills for 100 years of electrical service and said "Yeah right," and fixed the now-easier-to-find code that still used 2-digit dates. If that only happened a few times, the extra work involved in working out the January bill by hand (or waiting until February then billing for 2 months) wouldn't cause too many issues in the economy, and anyone looking in from outside wouldn't even realize there had been an issue. If it happened everywhere the economic impact would be more noticeable from outside.
>no, at no point would airplanes have been falling out of the sky
The assertion may have been unfounded, but I think it's just as unreasonable to assert the opposite. Bugs have cascading effects and in a sufficiently complex piece of software they can create chaos with unpredictable outcomes.
The one case I'm aware of where a software glitch did cause a plane crash, there was pilot error compounding the problem. Air France flight 447 was an Airbus A330 flying from France to Brazil, and while high over the Atlantic, the software recorded inconsistent data in its airspeed measurements. (The official crash analysis team concluded that the inconsistent data was likely due to ice crystals blocking the pitot tubes on the plane). The inconsistent data made the autopilot disengage. Pilot error then caused a stall. One pilot then tried the correct move to recover from a stall, pushing forward on the stick to nose down and regain speed. The other pilot was pulling up on the stick to stop the dive, not realizing that that's exactly the wrong thing to do in a stall (or more likely forgetting his training due to panic; he had a lot less experience). The flight software, receiving inconsistent inputs from both controls, averaged the inputs, resulting in zero change in pitch. (It also sounded the "Dual Input" alarm, but the pilots were too preoccupied with their own controls to figure out what that meant at first, and by the time they figured out what was going on it was too late to recover before the plane hit the water).
https://news.ycombinator.com/item?id=4224707 has some discussion of the events, including the fact that the control design (where each pilot has an independent stick) was part of the problem. On a design like Boeing uses where both sets of controls move together, the experienced pilot would have noticed the less-experienced pilot pulling up on the stick because his own stick would be moving, and he would have said "No, nose down." And if they had nosed down to recover speed while still high enough in the air, they almost certainly could have regained control of the plane and saved 228 lives (including their own).
So in retrospect, I think my first sentence was wrong. The software did not glitch, it did exactly what it was supposed to do. It was pilot error that caused the initial stall, and multiple pilot errors that caused the failure to recover from the stall.
There may be examples of software error that has caused planes to fall out of the sky, but I don't know of any. The only plane crashes whose cause I know were due to hardware failure or pilot error, usually a combination of the two.
I think your conclusion is upside down. Air safety is based on the "Swiss cheese" model. Multiple layers of safety nets are in place to compensate for issues in one layer. In particular, technical safeguards are there to prevent disasters if the human in the loop makes a mistake which will eventually happen. Any weakening of any technical safeguard makes the system less safe. No matter if the human ultimately made a mistake -- the technical system failing contributed to the accident just as much.
Y2K is especially interesting because the fact that the year 2000 would one day occur was entirely foreseeable, and no less probable in 1990 than in 1999. I can hardly think of anything with closer to 100% probability of happening.
To be fair, there was a non-zero chance that society could have ended (or your company, or the tech became obsolete) before 2000, which would be higher the earlier before 2000 you were.
The tech being obsolete is why Y2K was a smaller problem than it would have been otherwise. Most places were no longer running much COBOL code. But banks are famously slow to upgrade their tech, and for good reason much of the time, so most of the world's remaining COBOL code (and other code too, COBOL is just what I'm most familiar with, not that I'm all that familiar with it) was in banks and other financial institutions.
Year 2038 says hi.
my first thought too. I've met a few people who assert that Y2K was a complete waste of money.
I earned my first house deposit helping the team fixing the water and gas company in Wales, UK. Their entire system was running off a set of COBOL programs on a mainframe, none of which had been properly documented over the years, and the whole thing used 2-digit dates. It would have caused actual deaths if not fixed; everything would have shut down, and no water and no heating in a British winter is potentially lethal. And then it would have sent everyone in Wales a bill for 100 years of water and gas.
They were bribing retired software devs to come out of retirement with huge stacks of money, because that was cheaper than training new COBOL devs and getting them familiar with the spaghetti system.
It worked, no-one died, life went on. So obviously it was all fake rolls eyes
I'm curious why things would have shut down when the system thought it was 1900. What part of the logic had the effect of "shut the system down if current date is less than (X date)?" (If you can remember the code 25+ years later, that is).