They could have done a quick fix by adding an explicit app.slack.com record. But instead they removed the DNSSEC signing from the whole domain, thereby invalidating all records, not just the wildcard ones.
> I do care.
I will care once something else comes around with any promise of being implemented and rolled out. Until then, I see no need to discourage the adoption of DNSSEC, or disparage its design, except when designing its newer version or replacement.
> I'd like to see greater protection of the DNS infrastructure. DNSSEC adoption is hovering around 4%.
I work at a registrar and DNS hosting provider for more than 10.000 domains. More than 70% of them have DNSSEC.
> They could have done a quick fix by adding an explicit app.slack.com record. But instead they removed the DNSSEC signing from the whole domain, thereby invalidating all records, not just the wildcard ones.
1) That would do nothing to fix resolvers that had already cached NSEC responses lacking type maps.
2) That presumes the wildcard record was superfluous and could have been replaced with a simple A record for a single or small number of records. Would love to see a citation supporting that.
3) That presumes the Slack team could have quickly identified that the problem they were having was caused by the fact that app.slack.com (and whatever other hosts resolve from that wildcard) was caused by the fact the record was configured as a wildcard and would have been resolved by eliminating the wildcard record. If you read the postmortem, it is clear they zeroed in on the wildcard record as being suspect, but had to work with AWS to figure out the exact cause. I doubt that was an instantaneous process.
Any way you slice it, there was no quick way to fully recover from this bug once they hit it, and my argument is that the design of DNSSEC makes these issues a) likely to happen and b) difficult to model ahead of time, while providing fairly marginal security benefit.
At this point, I really don't care if you agree or disagree.
> I will care once something else comes around with any promise of being implemented and rolled out.
Yeah. DNSSEC is going to be widely deployed any day now. The year after the year of Linux on the desktop.
> I work at a registrar and DNS hosting provider for more than 10.000 domains. More than 70% of them have DNSSEC.
Cool. There are, what, 750 million domains registered worldwide? We are at nowhere near 10% adoption worldwide, let alone 70%. Of the top 100 domains -- the operators you would assume would be the most concerned about DNS response poisoning -- *six* have turned DNSSEC on.
> 1) That would do nothing to fix resolvers that had already cached NSEC responses lacking type maps.
The TTL for NSEC records are presumably way lower than the TTL for the DS records.
> 2) That presumes the wildcard record was superfluous and could have been replaced with a simple A record for a single or small number of records. Would love to see a citation supporting that.
It’s theoretically possible that it would not have worked for all cases, but that is, in my experience, very unlikely.
> Any way you slice it, there was no quick way to fully recover from this bug once they hit it
The bug seems to me to have been reasonably easy to mitigate, but their problem was that Slack did not know what they were doing. Thu bug itself was minor, but Slack tried to fix it by stopping to serve DNSSEC-signed DNS data, while long-TTL DS records were still being unexpired in the world. This is the worst possible thing you could do.
> Of the top 100 domains -- the operators you would assume would be the most concerned about DNS response poisoning -- *six* have turned DNSSEC on.
1. That number used to be zero, as tptacek liked to point out.
2. The huge operators often have fundamentally different security priorities than regular companies and users.
3. People said the same about IPv6 and SSL, which were also very slow to adopt. But they are all climbing.
> The TTL for NSEC records are presumably way lower than the TTL for the DS records.
Possibly. It was still an outage they had to wait out the TTL for, due to the design of DNSSEC.
> It’s theoretically possible that it would not have worked for all cases, but that is, in my experience, very unlikely.
This is completely unsubstantiated speculation on your part.
> The bug seems to me to have been reasonably easy to mitigate, but their problem was that Slack did not know what they were doing.
It is indeed reasonably easy to Monday-morning quarterback someone else's outage and blame operators for the sharp edges around poorly designed protocols.
> 1. That number used to be zero, as tptacek liked to point out.
Cool. so, at this rate, in another 100 years or so we should be at 50% adoption.
> 2. The huge operators often have fundamentally different security priorities than regular companies and users.
Priorities like uptime?
> 3. People said the same about IPv6 and SSL, which were also very slow to adopt. But they are all climbing
1) people started rolling IPv6 out once v4 addresses got scarce. There is no such compelling event to drive DNSSEC adoption. 2) SSL is easy to roll out and provides compelling security benefits. It is also exceedingly unlikely in practice to blow up in your face and result in run-out-the-clock outages -- unlike DNSSEC.
> This is completely unsubstantiated speculation on your part.
Do you have any support for your assumption that the wildcard record was vital and practically impossible to replace with regular records?
> It is indeed reasonably easy to Monday-morning quarterback someone else's outage and blame operators for the sharp edges around poorly designed protocols.
When Slack, being a large company presenting themselves as proficient in tech, make a tech mistake so bad that they lock themselves out if the internet for an entire day, a mistake even I know not to make, then I get to criticize them.
Cloudflare has been promoting DNSSEC for almost as long as I've been writing about DNSSEC, so no, nothing has really changed with the Top 100.
Of the top 100, only 2 have DNSSEC enabled --- cloudflare.com and cloudflare.net.
I imagine it depends on which "top domain" list you use. I use Cloudflare radar. As of today, for their top 100 domains, 6 have published DS records:
2371 13 2 32996839A6D808AFE3EB4A795A0E6A7A39A76FC52FF228B22B76F6D6 3826F2B9 cloudflare.com
2371 13 2 F52DBA4AAEA13A1F457C0FB4C1953F40E16AFC5C5E79EDF7CEED0FCF 0CBD81F0 cloudflare-dns.com
53074 13 2 86F2929EE3E5E501032B6DC94841A4A056A2D2876CABCF46A5F8907E B4917782 one.one
56044 8 2 1B0A7E90AA6B1AC65AA5B573EFC44ABF6CB2559444251B997103D2E4 0C351B08 dns.google
48553 13 2 57AF2F182A541A91AD24CC6583867C2BA331255B03E2A32579A625AD 1F3BE3CA taboola.com
33751 8 2 90C6CD28626CA7B8E3A1FACAD58D20D486E52DF040B9B2F085ACD5C7 03E624C6 nist.gov
Not that it matters much one way or the other - you won't find a top 100 domains list where, say, 50 have DNSSEC enabled.
There's a Right Answer for this!
https://tranco-list.eu/
(The Tranco list includes the Cloudflare data as a factor).
Fascinating.
My rebuttal: dnssecmenot.fly.dev.
I will respond with a link you once gave: <https://www.verisign.com/en_US/company-information/verisign-...>
You just gave a link with a graph that shows a recent sharp drop in DNSSEC adoption as if it was a mic drop. The page I showed you barely even has text; it doesn't need any, the implication is obvious.
It used to show "a recent sharp drop", back when you originally gave the link. It quite soon started to climb again, and the climb has continued, as is now clearly visible. This was pointed out to you, but you acted, and are still acting, as if nothing has happened since that time you first looked at it.
I'm happy, like you seem to be, to point people to the chart you just pasted.