Hacker News

qazwsxedchac 17 hours ago [ - ]

So a single configuration mistake in a single place wiped out external reachability of a major economy. It happened in the evening local time and should be fixable, modulo cache TTLs, by morning. This will limit the blast radius somewhat.

Still, at this level, brittle infrastructure is a political risk. The internet's famous "routing around damage" isn't quite working here. Should make for an interesting post mortem.

belorn 15 hours ago [ - ]

I am reminded of the warning that zonemaster gives about putting your domain name servers on a single AS, as is common practice for many larger providers. A lot of people do not want others to see this as a problem since a single AS is a convenient configuration for routing, but it has the downside of being a single point of failure.

Building redundant infrastructure that can withstand BGP and DNS configuration mistakes are not that simple but it can be done.

icedchai an hour ago [ - ]

It's simple enough to get a secondary DNS server somewhere and put it on $5/month VPS. I use BIND and DNS replication (AXFR/IXFR) handles it.

walrus01 13 hours ago [ - ]

As the CPU/RAM resources to run an authoritative-only slave nameserver for a few domains are extremely minimal (mine run at a unix load of 0.01), it's a very wise idea to put your ns3 or something at a totally different service provider on another continent. It costs less than a cup of coffee per month.

belorn 6 hours ago [ - ]

For a very long time, the computer club I was in operated a DNS server on a Pentium 75MHz and after the last major hardware upgrade it had a total of 110MB RAM memory and 2G disk space. It worked great except that before the upgrade it tended to run out of ram whenever there was a Linux kernel update, a problem we solved forever by populating all the ram slots with the maximum that the motherboard could handle to that nice 110 MB.

psd1 an hour ago [ - ]

Did you populate the motherboard with the most it could handle, or the most you could assemble from a box of assorted sticks?

Otherwise, 110MB would hint at a fascinating engineering culture at the motherboard manufacturer.

account42 4 hours ago [ - ]

This makes sense for larger providers but just for a small/personal website there is literally zero advantages to having distributed authoritative DNS servers when the webserver is on a single host.

Ironically, denic still requires you to have two separate name servers with different IPs for your domain (which can be worked around by changing the IP of the registered name server afterwards lol), a requirement that all other registries I use have dropped or never had because enforcing such a policy at the registry level makes zero sense.

icedchai an hour ago [ - ]

It depends. Do you also have email or other services for that domain? The advantage is your email doesn't start bouncing when your single host web site / DNS server is down.

account42 39 minutes ago [ - ]

Email bouncing during rare downtimes is hardly that big of an issue - if its actually important the sender will retry, possibly with a different contact method. And for short downtimes most likely the sender's MTA will just automatically retry a bit later - email is designed to work with temporary failures.

There isn't some magic reliability that everyone needs which just so happens to fall into "not achievable with a single authoritative name server" and "guaranteed with two servers". I'm not saying you should never have more than one, just that isn't the registry's business to decide what kind of availability guarantees you need for your domain.

11 hours ago [ - ]

[deleted]

deepsun 11 hours ago [ - ]

On Google cloud it's always four nameservers like

    ns-cloud-c1.googledomains.com
    ns-cloud-c2.googledomains.com
    ns-cloud-c3.googledomains.com
    ns-cloud-c4.googledomains.com

Would not make any sense to do four of them if it's a single AZ. Also, they are geo-aware and routed to your nearest region.

seabrookmx 9 hours ago [ - ]

Are you conflating autonomous system (AS) with availability zone (AZ)?

deepsun 8 hours ago [ - ]

Uhh, you're right, I totally did. Now I see the parent's point, thank you.

pocksuppet 16 hours ago [ - ]

DNS is a centralization risk, yes. Somehow we've decided this is fine. DNSSEC isn't the only issue - your TLD's nameservers could also be offline, or censored in your country.

skywhopper 16 hours ago [ - ]

DNS is barely centralized. Is there an alternative global name lookup system that is less centralized without even worse downsides?

account42 4 hours ago [ - ]

GP said it was a risk (and it is), not that there are better alternatives. Not all risks can be eliminated easily but you should still be aware of them.

fc417fc802 12 hours ago [ - ]

GNS is the obvious response here, in addition to the various blockchain based solutions. Nothing that enjoys widespread support or mindshare unfortunately.

Even the current centralized ICANN flavor could be substantially more resilient if it instead handed out key fingerprints and semi-permanent addresses when queried. That way it would only ever need to be used as a fallback when the previously queried information failed to resolve.

pocksuppet 15 hours ago [ - ]

BGP, but the names in question are limited to 128 bits, of which at most 48 will be looked up, and you don't get to choose which 48 bits are assigned to you.

greatgib 15 hours ago [ - ]

Normally it should not have been, with cache and all, but that was the past...

Think about what would happen the day that letsencrypt is borken for whatever reason technical or like having a retarded US leader and being located in the wrong country. Taken into account the push of letsencrypt with major web browsers to restrict certificate validities for short periods like only a few days...

muvlon 15 hours ago [ - ]

Let's Encrypt has to be down for days before people begin to feel the pain. DNS is very different, it breaks stuff immediately everywhere.

tharkun__ 14 hours ago [ - ]

No it doesn't. DNS breaks as soon as TTLs run out. It's your choice to set them so low that stuff breaks immediately.

account42 4 hours ago [ - ]

Unfortunately you can't set DNS TTL arbitrarily high (or low) without some resolvers ignoring your suggestion and using arbitrary values.

Arnt 4 hours ago [ - ]

Most historical outages lasted minutes or hours. One arguably lasted much longer, when someone lost control of their servers due to civil war.

I haven't followed this closely, but have there been any... shall we say plain outages longer than six hours? That's not an outrageous TTL. Or a day.

__float 11 hours ago [ - ]

What do you recommend then? DNS doesn't usually change that often, but if you mess it up when it does, you're in for some pain if TTLs are high!

htgb 10 hours ago [ - ]

Not the one you're replying to, but I'd keep TTL high normally and lower it one TTL ahead of a planned change.

kenniskrag 5 hours ago [ - ]

I would define high as "double time needed to fix a dns issue" and account for weekends

stouset 8 hours ago [ - ]

This is the way.

ale42 5 hours ago [ - ]

This assumes that the host name you want has been recently queried. If it's not cached, good luck...

cyberax 16 hours ago [ - ]

Not really? .com and .net are still up

If Let's Encrypt goes down, half of the Internet will become inaccessible in a week.

akerl_ 15 hours ago [ - ]

Presumably if LetsEncrypt goes down and stays down for a week, the sites that go down are the ones that see that their CA went down and at no point in the week take the option to get certs from a different CA?

bluejekyll 13 hours ago [ - ]

I guarantee that there are a ton of sites out there not monitoring their certs.

throw0101c 3 hours ago [ - ]

Including Microsoft, Starlink, Github, Cisco:

* https://www.keyfactor.com/blog/2023s-biggest-certificate-out...

gpvos 6 hours ago [ - ]

"A ton" being a misspelling of "the vast, vast majority".

fragmede 6 hours ago [ - ]

Are there alternative CAs that are anywhere as easy to deal with as Lets encrypt?

kenniskrag 5 hours ago [ - ]

acme.sh supports multiple CAs there is even a RFC for CAs that describe the api.

sllabres 14 hours ago [ - ]

So it seems we need something like this [1] for IT infrastructure? ;)

[1] https://outerspaceinstitute.ca/crashclock/

16 hours ago [ - ]

[deleted]

gerdesj 14 hours ago [ - ]

"The internet's famous "routing around damage" isn't quite working here."

DNS is a look up service that runs on the internet.

Internet routing of IP packets is what the internet does and that is working fine (for a given value of fine).

You remind me of someone using the term "the internet is down" that really means: "I've forgotten my wifi password".

LastTrain 14 hours ago [ - ]

Us non pod-people caught his drift.

eru 13 hours ago [ - ]

What's a pod-people?

the8472 16 hours ago [ - ]

fail-closed protocols have introduced some brittleness. A HTTP 1.0 server from 1999 probably still can service visitors today. A HTTPS/TLS 1.0 server from the same year wouldn't.

zelon88 9 hours ago [ - ]

I think I see the point you're making here and I agree.

There is designing something to be fail-closed because it needs to be secure in a physical sense (actually secure, physically protected), and then there's designing something fail-closed because it needs to be secure from an intellectual sense (gatekept, intellectually protected). While most of the internet is "open source" by nature, the complexity has been increased to the point where significant financial and technical investment must be made to even just participate. We've let the gatekeepers raise the gates so high that nobody can reach them. AI will let the gatekeepers keep raising the gates, but then even they won't be able to reach the top. Then what?

I think the point you're trying to make, put another way is in the context of "availability" and "accessibility" we've compromised a lot of both availability and accessibility in the name of security since the dawn of the internet. How much of that security actually benefits the internet, and how much of that security hinders it? How much of it exists as a gatekeeping measure by those who can afford to write the rules?

account42 4 hours ago [ - ]

Backwards compatibility is unfortunately not something security folk care about.

sam_lowry_ 6 hours ago [ - ]

This is why I still run my blog on HTTP/1.1 only.

account42 4 hours ago [ - ]

What no HTTP/1.0 for those of us too lazy to type the Host header into telnet???

sam_lowry_ 4 hours ago [ - ]

Oh, because I host it with a few more sites on my tiny Hetzner cloud server.

fc417fc802 12 hours ago [ - ]

You're not wrong but objecting to fail-closed in a security sensitive context is entirely missing the point.

Woodi 10 hours ago [ - ]

> So a single configuration mistake in a single place wiped out external reachability of a major economy.

Real world beats sci-fi :) And isn't it why we love IT ? And hate it too, because of "peoples in charge"...

Muromec 15 hours ago [ - ]

>So a single configuration mistake in a single place wiped out external reachability of a major economy.

And fuck nothing at all happened as a result.

Our_Benefactors 15 hours ago [ - ]

Prove it? I’m sure many lifespans were lost to stress

pinkgolem 10 hours ago [ - ]

As someone with oncall yesterday it was a fun experience, but you noticed quickly that everything .de was down and then it was just a waiting game.

We had a short discussion about migrating to .com, but decided risk != reward as no one would know the new tld

I assume there are a couple people working for denic who had a stressfull night..

lschueller 16 hours ago [ - ]

I have a bad feeling, that the impact will be quite severe for some services, as monitoring, performance, and security services might get disrupted. and just cleaning up is a big mess.. Worst case, some ot will experience outage and / or damage. But maybe I am just overestimating the severity of this.

number6 10 hours ago [ - ]

There is the kritis (critical infrastructure law) law, which trys to enforce some standards to make things not as brittle.

beeforpork an hour ago [ - ]

... wiped out external reachability of a major economy ...

internal reachability (from Germany to .de domains), too... :-)))

walrus01 17 hours ago [ - ]

It looks like a failed key replacement during a scheduled maintenance event. Normally this sort of thing is thoroughly tested and has multiple eyes on for detailed review and planning before changes get committed, but obviously something got missed.

account42 4 hours ago [ - ]

Would be interesting to know how something could get missed. You'd think the system was set up so that new keys could not be published without being verified working in a staging system.

otabdeveloper4 10 hours ago [ - ]

> The internet's famous "routing around damage"

...is only for Pentagon networks and military stuff. It's not for us normal people. (We get Cloudflare and FAANG bullshit instead.)

zelon88 9 hours ago [ - ]

This is actually startlingly true.

Every FAANG company has their own fiber backbone. Why invest the internet that everyone uses when you can invest in your own private internet and then sell that instead?

profmonocle 7 hours ago [ - ]

It's not like the long-haul fiber not owned by FAANG is a public utility, at least not in most places.

Traffic that goes over "the Internet" traverses some mix of your ISP's fiber, fiber belonging to some other ISP they have a deal with, then fiber belong to some ISP they have a deal with, etc.

All those ISPs are being paid to provide service, they can invest in their own networks.

account42 4 hours ago [ - ]

And we all know that ISPs are famous for investing in timely infrastructure upgrades.