If you're going to host user content on subdomains, then you should probably have your site on the Public Suffix List https://publicsuffix.org/list/ . That should eventually make its way into various services so they know that a tainted subdomain doesn't taint the entire site....
God I hate the web. The engineering equivalent of a car made of duct tape.
> Since there was and remains no algorithmic method of finding the highest level at which a domain may be registered for a particular top-level domain
A centralized list like this not just for domains as a whole (e.g. co.uk) but also specific sites (e.g. s3-object-lambda.eu-west-1.amazonaws.com) is both kind of crazy in that the list will bloat a lot over the years, as well as a security risk for any platform that needs this functionality but would prefer not to leak any details publicly.
We already have the concept of a .well-known directory that you can use, when talking to a specific site. Similarly, we know how you can nest subdomains, like c.b.a.x, and it's more or less certain that you can't create a subdomain b without the involvement of a, so it should be possible to walk the chain.
Example:
Maybe ship the domains with the browsers and such and leave generic sites like AWS or whatever to describe things themselves. Hell, maybe that could also have been a TXT record in DNS as well.> any platform that needs this functionality but would prefer not to leak any details publicly.
I’m not sure how you’d have this - it’s for the public facing side of user hosted content, surely that must be public?
> We already have the concept of a .well-known directory that you can use, when talking to a specific site.
But the point is to help identify dangerous sites, by definition you can’t just let the sites mark themselves as trustworthy and rotate around subdomains. If you have an approach that doesn’t have to trust the site, you also don’t need any definition at the top level you could just infer it.
It's actually exactly the same concept that come to mind for me. `SomeUser.geocities.com` is "tainted", along with `*.geocities.com`, so `geocities.com/.wellknown/i-am-tainted` is actually reasonable.
Although technically it might be better as `.wellknown/taint-regex` (now we have three problems), like `TAINT "*.sites.myhost.com" ; "myhost.com/uploads/*" ; ...`
I think we disagree on the problem.
The thing you want to avoid is this:
a.scamsite.com gets blocked so they just put their phishing pages on b.scamsite.com
The psl or your solution isn’t a “don’t trust subdomains” notification it’s “if one subdomain is bad, you should still trust the others” and the problem there is you can’t trust them.
You could combine the two, but you still need the suffix list or similar curation.
It's more like "provenance" of content. I broadcast my accountability of "myblog.com/posts/...", but would disavow "myblog.com/posts/.../#comments"
There's some ways of like "nofollow", but nothing systematic, and no "protocol" for disavowing paths, uploads, or fragments.
Back in the slashdot days, I thought of "blogs are the stationary of the internet", a way to more authoritatively declare that the content was yours... but interop is hard and unprofitable so walled gardens became the norm.
We just haven't had the benefit or forcing function which encourages a solution to "that stuff over there is less trusted than my stuff over here".
Maybe we're at the point where hosts of any kind MUST be responsible (or accountable) for any content originating from their domain? It kills indie/anonymous hosting, but puts a fine "KYC" point on distributing "evil" stuff on the internet?
It does smell very much like a feature that is currently implemented as a text file but will eventually need to grow to its own protocol, like, indeed, the hostfile becoming DNS.
One key difference between this list and standard DNS (at least as I understand it; maybe they added an extension to DNS I haven't seen) is the list requires independent attestation. You can't trust `foo.com` to just list its subdomains; that would be a trivial attack vector for a malware distributor to say "Oh hey, yeah, trustme.com is a public suffix; you shouldn't treat its subdomains as the same thing" and then spin up malware1.trustme.com, malware2.trustme.com, etc. Domain owners can't be the sole arbiter of whether their domain counts as a "public suffix" from the point of view of user safety.
It looks like Mozilla does use DNS to verify requests to join the list, at least.
Doing this DNS in the browser in real-time would be a performance challenge, though. PSL affects the scope of cookies (github.io is on the PSL, so a.github.io can't set a cookie that b.github.io can read). So the relevant PSL needs to be known before the first HTTP response comes back.I presume it has to be a curated list otherwise spammers would use it to evade blocks. Otherwise why not just use DNS?
Whois would be the choice. DNS’s less glamourous sibling, purpose built for delegated publication of accountability records
Whois isn't curated either.
Neither is nominating a third party for your parking fine.
The point is to get away from centralized gatekeepers, not establish more of them. A hierarchy of disavowal. It’s like cache invalidation for accountability.
If you don’t wanna be held responsible for something, you’d better be prepared to point the finger at someone whois.
> God I hate the web
This is mostly a browser security mistake but also partly a product of ICANN policy & the design of the domain system, so it's not just the web.
Also, the list isn't really that long, compared to, say, certificate transparency logs; now that's a truly mad solution.
Show me a platform not made out of duct tape and I'll show you a platform nobody uses.
regular cars?
Jeep just had an OTA update cause the car to shut down on the highway (it is rumored).
Before we put computers in cars, we had the myriad small things that would break (stuck doors, stuck windows, failed seals, leaking gaskets), a continuous stream of recalls for low-probability safety issues, and the occasional Gremlin or Pinto.
My favorite example is the Hyundai Elantra. They changed the alloy used in one of the parts in the undercarriage. Tested that model to death for a year, as they do, but their proving ground is in the southern United States.
Several winters later, it turns out that road salt attacks the hell out of that alloy and people have wheels flying off their cars in the middle of the road.
The Honda issue where setting a certain radio station, would brick the infotainment? That good enough?
> That good enough?
Not really. Does the car still drive? That sounds like a software bug; hardly indicative that the entire car is held together with duct tape, but a pretty bad bug non the less.
So i can't remember the specifics or find any references, but many years ago i remember reading about a car (prius maybe?) that would shut off and lock the doors when pulling away from a stop. (Ex: stopped at a red light, when it turns green the car would go far enough to cut off in the middle of an intersection then trap everyone inside.)
"This is Fine."
That's terrifying.
The browser still drives when Google throws up a safety warning.
It's just harder to drive to one house, and the homeowner is justifiably irritated about this.
More accurate: a mom-n-pop grocery store has its listing on Google Maps changed to PERMANENTLY CLOSED DUE TO TOXIC HEALTH HAZARDS because the mom-n-pop grocery store didn't submit Form 26B/Z to Google. There was never any health hazard, but now everyone thinks there is, and nobody can/will go there. The fact that Form 26B/Z exists at all is problematic, but what makes it terrible is the way it's used to punish businesses for not filling out a form they didn't know existed.
This is an excellent analogy because it is incumbent upon businesses to follow all the laws, including the ones they don't know about. That's one of the reasons "lawyer" is a profession.
Google doesn't have the force of law (it's in this context acting more like a Yelp: "1 star review --- our secret shopper showed up and the manager didn't give the secret 'we are not criminals' hand sign"), but the basic idea is the same: there is a complex web of interactions that can impact your online presence and experts in the field you can choose to hire for consulting or not.
Didn't used to be that way, but the web used to be a community of 100,000 people, not 5.6 billion. Everything gets more complicated when you add more people.
The other commenter's analogy of a small-business is better I think, the issue with the browser problem is that it doesn't hinder one person getting to one house, it hinders all persons getting to one place the owner _wants_ people to get to easily.
The browser issue can destroy a small business, one thing I think we can universally agree we don't want. If all of the people who come looking for it find it's being marked as malicious or just can't get there at all, they lose customers.
Worse yet, is that Google holds the keys because everyone uses Chrome, and you have to play their game by their rules just to keep breathing.
Here's the thing though: if someone else held the keys, the scenario would be the same unless there was no safe browsing protection. And if there were no safe browsing protection, we'd be trading one ill for another; small business owners facing a much steeper curve to compete vs. everyone being at more risk from malware actors.
I honestly don't immediately know how to weigh those risks against each other, but I'll note that this community likely underestimates the second one. Most web users are not nearly as tech- or socially-savvy as the average HN reader and the various methods of getting someone to a malware subdomain are increasingly sophisticated.
The road network is a much better analogy here.
Never heard of this. Link please?
Don't know about Honda, but there is this Mazda one [0] (Would not be surprised if it affected multiple vendors!)
[0] https://www.soundandvision.com/content/remembering-time-when...
Yikes. I missed that. Makes sense it wasn't just the station it was tuned to but the particular data they broadcasted; insane there was no way to power reset the system into a good state.
Admitting I'm old, but my HP-11C still gets pretty-regular use.
And judging by eBay prices, or the SwissMicros product line, I suspect I have plenty of company.
"The engineering equivalent of a car made of duct tape"
Kind of. But do you have a better proposition?
I'd probably say we ought to use DNS.
And while we’re at it, 1) mark domains as https-only, and 2) when root domains map to a subdomain (eg www).
I might amuse you to know hat we also already have a text file as a solution for https-only sites.
Cookies shouldn't be tied to domains at all, it's a kludge. They should be tied to cryptographic keypairs (client + server). If the web server needs a cookie, it should request one (in its reply to the client's first request for a given url; the client can submit again to "reply" to this "request"). The client can decide whether it wants to hand over cookie data, and can withhold it from servers that use different or invalid keys. The client can also sign the response. This solves many different security concerns, privacy concerns, and also eliminates the dependency on specific domain names.
I just came up with that in 2 minutes, so it might not be perfect, but you can see how with a little bit of work there's much better solutions than "I check for not-evil domain in list!"
> They should be tied to cryptographic keypairs (client + server).
So now, if a website leaks its private key, attackers can exfiltrate cookies from all of its users just by making them open an attacker-controlled link, for as long as the cookie lives (and users don't visit the website to get the rotated key).
> If the web server needs a cookie, it should request one
This adds a round-trip, which slows down the website on slow connections.
> the client can submit again to "reply" to this "request"
This requires significantly overhauling HTTP and load-balancers. The public-suffix list exists because it's an easy workaround that didn't take a decade to specify and implement.
> So now, if a website leaks its private key, attackers can exfiltrate cookies from all of its users just by making them open an attacker-controlled link
This attack already exists in several forms (leaking a TLS private key, DNS hijack, CA validation attack, etc). You could tack a DNS name onto the crypto-cookies if you wanted to, but DNS is trivial to attack.
> This adds a round-trip, which slows down the website on slow connections.
Requests are already slowed down by the gigantic amount of cookies constantly being pushed by default. The server can send a reply-header once which will tell the client which URLs need cookies perpetually, and the client can store that and choose whether it sends the cookies repeatedly or just when requested. This gives the client much more control over when it leaks users' data.
> This requires significantly overhauling HTTP and load-balancers
No change is needed. Web applications already do all of this all the time. (example: the Location: header is frequently sent by web apps in response to specific requests, to say nothing of REST and its many different request and return methods/statuses/headers).
> The public-suffix list exists because it's an easy workaround
So the engine of modern commerce is just a collection of easy hacks. Fantastic.
> This attack already exists in several forms (leaking a TLS private key, DNS hijack, CA validation attack, etc).
An attacker who gets the TLS private key of a website can't use it easily, because they still need to fool users' browser into connecting to a server they control as the victim domain, which brings us to:
> You could tack a DNS name onto the crypto-cookies if you wanted to, but DNS is trivial to attack.
It's not. I can think of two ways to attack the DNS. Either 1. control or MITM of the victim's authoritative DNS server or 2. poison users' DNS cache.
Control/MITM of the authoritative server is not an option for everyone (only ISPs/backbone operators), and according to Cloudflare: "DNS poisoning attacks are not easy" (https://www.cloudflare.com/learning/dns/dns-cache-poisoning/)
> Requests are already slowed down by the gigantic amount of cookies constantly being pushed by default
Yes, although adding more data and adding a round-trip have different impacts (high-bandwidth high-latency connections exist). Lots of cookies and more round-trips is always worse than lots of cookies and a fewer round-trips.
> The server can send a reply-header once which will tell the client which URLs need cookies perpetually, and the client can store that and choose whether it sends the cookies repeatedly or just when requested.
Everyone hate configuring cache, so in most cases site operators will leave it to a default "send everything", and we're back to square one.
> No change is needed.
I was thinking that servers need to remember state between the initial client request and when the client sends an other request with the cookies. But on second thought that's indeed not necessary.
> So the engine of modern commerce is just a collection of easy hacks. Fantastic.
I'm afraid so
There's at least a dozen different attacks on DNS, but the main ones regarding record validation include multiple types of spoofing and MITM (at both the DNS and IP level), cache poisoning, account takeover (of either the nameserver or registrar), DoS attack, etc.
Cache poisoning is the easiest method, and contrary to whatever Cloudflare says, it's trivial. The DNS transaction number is 16-bits. All you have to do is flood the shit out of the resolver with spoofed packets and eventually one of the transaction numbers will hit, and your attack is successful. It's low-bandwidth, takes at most a couple hours, and nobody notices. This is one of the many reasons you can't just trust whatever DNS says.
The choice of what HTTP messages to cache is not always a choice, as is the case with HSTS. But it could be made one if testing of this proposal (which again, I came up with in 2 minutes) showed better results one way or another.
But all this is moot anyway cuz nobody gives a crap.
A part of the issue is IMO that browsers have become ridiculously bloated everything-programs. You could take about 90% of that out and into dedicated tools and end up with something vastly saner and safer and not a lot less capable for all practical purposes. Instead, we collectively are OK with frosting this atrocious layer cake that is today's web with multiple flavors of security measures of sometimes questionable utility.
End of random rant.
"You could take about 90% of that out and into dedicated tools "
But then you would loose plattform independency, the main selling point of this atrocity.
Having all those APIs in a sandbox that mostly just work on billion devices is pretty powerful and a potential succesor to HTML would have to beat that, to be adopted.
The best thing to happen, that I can see, is that a sane subset crystalizes, that people start to use dominantly, with the rest becoming legacy, only maintained to have it still working.
But I do dream of a fresh rewrite of the web since university (and the web was way slimmer back then), but I got a bit more pragmatic and I think I understood now the massive problem of solving trusted human communication better. It ain't easy in the real world.
But do we need e.g serial port or raw USB access straight from a random website? Even WebRTC is a bit of a stretch. There is a lot of cruft in modern browsers that does little except increase attack surface.
This all just drives a need to come up with ever more tacked-on protection schemes because browsers have big targets painted on them.
> Even WebRTC is a bit of a stretch
You remove that, and videoconferencing (for business or person to person) has to rely on downloading an app, meaning whoever is behind the website has to release for 10-15 OSes now. Some already do, but not everyone has that budget so now there's a massive moat around it.
> But do we need e.g serial port or raw USB access straight from a random website
Being able to flash an IoT (e.g. ESP32) device from the browser is useful for a lot of people. For the "normies", there was also Stadia allowing you to flash their controller to be a generic Bluetooth/usb one on a website, using that webUSB. Without it Google would have had to release an app for multiple OSes, or more likely, would have just left the devices as paperweights. Also, you can use FIDO/U2F keys directly now, which is pretty good.
Browsers are the modern Excel, people complain that they do too much and you only need 20%. But it's a different 20% for everyone.
I'll flip that around on you: why oh why do we need to browsers to carry these security holes in them? The Stadia flasher is a good example: how do I know that a website doesn't contain a device flasher that will turn one of my connected devices into a malicious actor that will attempt to take over whatever machine it's plugged into?
You know because there is an explicit permission box that pops out and asks if you want to give this website access to a device, and asks you to select that device.
Same as your camera/microphone/location.
But that still gives completely unvetted direct access to the device to a website! People have been pointing to Itch.io games that supposedly require direct USB access. How hard is it to hide a script in there that reprograms a controller into something malicious?
If you download a executable from a website and run it .. pretty much the same thing?
If you give USB access, it is not really a website anymore, rather a app delivered through the web. I don't see a fundamental difference in trust.
I rather am able to verify the web based version easier and I certainly won't give access to a random website, just like I don't download random exes from websites.
Performance is lower, yes and well ... like I said, it is all a big mess. Just look at the global namespace in js. I still use it because of that power feature called plattform independence. What I release, people can (mostly) just use. I (mostly) don't care which OS the user has.
A fule thst lands on my hard drive is aztomatically scanned for malware. That same kindof protection isn't in place against malicious scripts downloaded by my broswer via an opaque HTTPS connection and run in process.
And we all know that non-technical users never just click Yes to make the annoying popup go away.
Itch.io games and controller support.
You have sites now that let you debug microcontrollers on your browser, super cool.
Same thing but with firmware updates in the browser. Cross platform, replaced a mess of ugly broken vendor tools.
While that's pretty convenient, I'm worried about what happens when the vendor shuts down the website. "Ugly broken vendor tools" can be run forever in a VM of an old system, but a website would be gone forever unless it's purely client-side and someone archived it.
Just because you can do something doesn't mean you should.
Your micro-controllers should use open standards for their debugging interface and not force people to use the vendor website.
WebRTC I use since many years and would miss it a lot. P2P is awesome.
WebUSB I don't use or would miss it right now, but .. the main potential use case is security and it sounds somewhat reasonable
"Use in multi-factor authentication
WebUSB in combination with special purpose devices and public identification registries can be used as key piece in an infrastructure scale solution to digital identity on the internet."
https://en.wikipedia.org/wiki/WebUSB
> But do we need e.g serial port or raw USB access straight from a random website?
But do we need audio, images, Canvas, WebGL, etc? The web could just be plain text and we’d get most of the “useful” content still, add images and you get a vast majority of it.
But the idea that the web is a rich environment that has all of these bells and whistles is a good thing imo. Yes there’s attack surface to consider, and it’s not negligible. However, the ability to connect so many different things opens up simple access to things that would otherwise require discrete apps and tooling.
One example that kind of blew my mind is that I wanted a controller overlay for my Twitch stream. After a short bit of looking, there isn’t even a plugin needed in OBS (streaming software). Instead, you add a Web View layer and point it to GamePad Viewer[1] and you’re done.
Serial and USB are possibly a boon for very specific users with very specific accessibility needs. Also, iirc some of the early iPhone jailbreaks worked via websites on a desktop with your iPhone plugged into usb. Sure these are niche, and could probably be served just as well or better with native apps, and web also makes the barrier to entry so much lower .
[1]: https://gamepadviewer.com/
> But do we need e.g serial port or raw USB access straight from a random website?
Yes. Regards, CIA, Mossad, FSB etc.
How else am I going to make a game in the browser that be controlled with a controller?
Every decent host OS already has a dedicated driver stack to provide game controller input to applications in a useful manner. Why the heck would you ship a reimplementation of that in JS in a website?
So that you can take input from countrollers that haven't been invented yet and won't fit the HID model.
If it hasn't been invented yet, you don't need driver software for it, do you? ;)
Anyway, in your scenario the controller would be essentially a one off and you'd be better off writing a native app to interface with it for the one computer this experiment will run on.
If it hasn't been invented yet we don't know the implications of giving a website access to it either.
And that's before realizing it's already a bad idea with existing devices because they were never designed for giving untrusted actors direct access.
That's why we have a privacy and security sandbox in browsers.
You don't, that's the point: not everything needs to be crammed into a browser.
Unlikely. The convenience incentives are far too high to leave features on the table.
Not unlike the programming language or the app (growing until it half-implements LISP or half-implements an email client), the browser will grow until it half-implements an operating system.
For everyone else, there's already w3m.
> Having all those APIs in a sandbox that mostly just work on billion devices is pretty powerful and a potential succesor to HTML would have to beat that, to be adopted.
I think the giant major downside, is that they've written a rootkit that runs on everything, and to try to make up for that they want to make it so only sites they allow can run.
It's not really very powerful at all if nobody can use it, at that point you are better off just not bothering with it at all.
The Internet may remain, but the Web may really be dead.
"It's not really very powerful at all if nobody can use it"
But people do use it, like the both of us right now?
People also use maps, do online banking, play games, start complex interactive learning environments, collaborate in real time on documents etc.
All of that works right now.
> to try to make up for that they want to make it so only sites they allow can run
What do you mean, you can run whatever you want on localhost, and it's quite easy to host whatever you want for whoever you want too. Maybe the biggest modern added barrier to entry is that having TLS is strongly encouraged/even needed for some things, but this is an easily solved problem.
The blog post and several anecdotes in the comments prove otherwise
Not sure if it counts but I've been enjoying librewolf. I believe just a stripped down firefox.
>A part of the issue is IMO that browsers have become ridiculously bloated everything-programs.
I don't see how that solves the issue that PSL tries to fix. I was a script kiddy hosting neopets phishing pages on free cpanel servers from <random>.ripway.com back in 2007. Browsers were way less capable then.
PSL and the way cookies work is just part of the mess. A new approach could solve that in a different way, taking into account all the experience we had with scriptkiddies and professional scammers and pishers since then. But I also don't really have an idea where and how to start.
And of course, if the new solution completely invalidates old sites, it just won't get picked up. People prefer slightly broken but accessible to better designed but inaccessible.
> People prefer slightly broken but accessible to better designed but inaccessible.
We live in world where whatever faang adopts is de facto a standard. Accessible these days means google/gmail/facebook/instagram/tiktok works. Everything else is usually forced to follow along.
People will adopt whatever gives them access to their daily dose of doomscrolling and then complain about rather crucial part of their lives like online banking not working.
> And of course, if the new solution completely invalidates old sites, it just won't get picked up.
Old sites don't matter, only high-traffic sites riddled with dark patterns matter. That's the reality, even if it is harsh.
> People prefer slightly broken but accessible to better designed but inaccessible.
It's not even broken as the edge cases are addressed by ad-hoc solutions.
OP is complaining about global infrastructure not having a pristine design. At best it's a complain over a desirable trait. It's hardly a reason to pull the Jr developer card and mindlessly advocate for throwing everything out and starting over.
2007 you say and less capable you say?!
Try 90s! We had to fight off ActiveX Plugins left and right in the good olde Internet Explorer! Yarr! ;-)
Are you saying we should make a <Unix Equivalent Of A Browser?> A large set of really simple tools that each do one thing really really really pedantically well?
This might be what's needed to break out of the current local optimum.
Maybe it's time to revive something like the uzbl[1] project, or start something similar.
[1] https://www.uzbl.org/
I haven't thought of it that way, but that might be a solution.
There was an attempt in that direction.
https://www.uzbl.org/
You are right from a technical point, I think, but in reality - how would one begin to make that change?
I'm under the impression that CORS largely solves it?
which is still much too new to be able to shut down the PSL of course. but maybe in 2050.
Since this is being downvoted: no, I'm quite serious.
CORS lets sites define their own security boundaries between subdomains, with mutual validation. If you're hosting user content in a subdomain, just don't allow-origin it: that is a clear statement that it's not "the same site". PSL plays absolutely no part in that logic, it seems clear to me that it's at least in part intended to replace the PSL.
Do other sites (like google's safety checks) use CORS for this purpose? Dunno. Seems like they could though? Or am I missing something?
I think we lost the web somewhere between PageRank and JavaScript. Up to there it was just linked documents and it was mostly fine.
Why is it a centrally maintained list of domains, when there is a whole extensible system for attaching metadata to domain names?
I love the web. It's the corporate capitalistic ad fueled and govt censorship web that is the problem.
> God I hate the web. The engineering equivalent of a car made of duct tape.
Most of the complex thing I have seen being made (or contributed to) needed duct tape sooner or later. Engineering is the art of trade-offs, of adapting to changing requirements (that can appear due to uncontrollable events external to the project), technology and costs.
Related, this is how the first long distance automobile trip was done: https://en.wikipedia.org/wiki/Bertha_Benz#First_cross-countr... . Seems to me it had quite some duct tape.
Why would you compare Web to that? A first fax message would be more appropriate comparison.
Web is not a new thing and hardly a technical experiment of a few people any more.
If you add the time since announcing the concept of Web to that trip date, you have a very decent established industry already. With many sport and mass production designs:
https://en.wikipedia.org/wiki/Category:Cars_introduced_in_19...
For me the web is something along the lines at the definition of: https://en.wikipedia.org/wiki/World_Wide_Web to sum up "...universal linked information system...". I think the fax misses many aspects of the core definition to be a good comparison.
Not sure what is your point about "decent established industry" if we relate to "duct tape". I see two possibilities:
a) you imply that the web does not have a decent established industry (but I would guess not).
b) you would claim that there was no "duct tape" in 1924 car industry. I am no expert but I would refer you to the article describing what was the procedure to start the car at https://www.quora.com/How-do-people-start-their-cars-in-the-..., to quote:
> Typical cold-start routine (common 1930s workflow)
> 1. Set hand choke (pull knob).
> 2. Set throttle lever to slight fast‑idle.
> 3. Retard spark if manual advance present.
> 4. Engage starter (electric) or use hand crank.
> 5. Once running, push choke in gradually, advance spark, reduce throttle.
Not sure about your opinion but compared to what a car's objective is (move from point A to point B) to me that sounds rather involved. Not sure if it qualifies as "duct-tape" but definitely it is not a "nicely implemented system that just works".
To resume my point: I think on average progress is slower and harder than people think. And that is mostly because people do not have exposure to the work people are doing to improve things until something can become more "widely available".
That's the nature of decentralised control. It's not just DNS, phone numbers work in the same way.
All web encryption is backed by static list of root certs each browser maintains.
Idk any other way to solve it for the general public (ideally each user would probably pick what root certs they trust), but it does seem crazy.
We already have a solution to solve it: DNS-based Authentication of Named Entities (DANE)
This solution is even more obvious today where most certificates are just DNS lookups with extra steps.
What we need is a web made in a similar way to the wicker-bodied cars of yesteryear
I'm not sure I'm following what inherent flaw you are suggesting browsers had that the public suffix list originators knew they had.
Wait until you learn about the HSTS preload list.
I think it's somewhat tribal webdev knowledge that if you host user generated content you need to be on the PSL otherwise you'll eventually end up where Immich is now.
I'm not sure how people not already having hit this very issue before is supposed to know about it beforehand though, one of those things that you don't really come across until you're hit by it.
This is the first time I hear about https://publicsuffix.org
You're in good company! From 12 days ago: https://news.ycombinator.com/item?id=45538760
I’ve been doing this for at least 15 years and it’s the first I heard of this.
Fun learning new things so often but I never once heard of the public suffix list.
That said, I do know the other best practices mentioned elsewhere
First rule of the public suffix list...
I think what gets me more is I don't see an easy way to add suffixes to the list. I'm sure if I dig I can figure it out but you'd think given how its used they'd have an obvious step by step guide on the website
Last link the menu header: https://publicsuffix.org/submit/
Which then links to: https://github.com/publicsuffix/list/wiki/Guidelines#submitt...
Fairly obvious and typical webpage > documentation flow I think, doesn't seem too hard to find.
Ok so we need a GitHub (Microsoft) account to avoid needing a Google account to in case some undocumented system decides to shut down a website we host. Great.
I agree, that's pretty dumb. But I wouldn't say "no easy way to add suffixes to the list" at the very least.
Besides user uploaded content it's pretty easy to accidentally destroy the reputation of your main domain with subdomains.
For example:
At this point if someone else on that hosting provider gets that IP address assigned, your subdomain is now hosting their content.I had this happen to me once with PDF books being served through a subdomain on my site. Of course it's my mistake for not removing the A record (I forgot) but I'll never make that mistake again.
10 years of my domain having a good history may have gotten tainted in an unrepairable way. I don't get warnings visiting my site but traffic has slowly gotten worse over time since around that time, despite me posting more and more content. The correlation isn't guaranteed, especially with AI taking away so much traffic but it's something I do think about.
The Immich domains that are hit by this issue are -not- user generated content.
They clearly are? It seems like GitHub users submitting a PR could/can add a `preview` label, and that would lead to the application + their changes to be deployed to a public URL under "*.immich.cloud". So they're hosted content generated by users (built application based on user patches) on domains under their control.
I'm the guy that built the system, lol. Labels can only be added by maintainers, and the whole system only works for PRs from internal branches.
Ah, then that's a different situation then, sorry for misunderstanding the context and thanks for clearing that up! I was under the impression that Immich accepted outside contributions, and those would also have those preview sites created for their pending contributions.
Clearly they are not reading HN enough. It hasn’t even been two weeks since this issue last hit the front page.
I wish this comment were top ranked so it would be clear immediately from the comments what the root issue was.
[flagged]
so its skill issue ??? or just google being bad????
I will go with Google being bad / evil for 500.
Google 90s to 2010 is nothings like Google 2025. There is a reason they removed "Don't be evil" ... being evil and authoritarian makes more money.
Looking at you Manifest V2 ... pour one out for your homies.
Don't get me wrong, Google is bad/evil in many ways, but the public suffix list exists to solve a real risk to users. Google is flagging this for a legit reason in this particular case.
It's not a legit reason at all. A website isn't "unsafe" just because it looks similar to another one to Google's AI. At best such an automated flag should trigger a human review, not take the website offline.
Google needs to be held liable for the damages they do in cases like this or they will continue to implement the laziest solutions as long as they can externalize the costs.
Sympathy for the devil, people keep using Google's browser because the safe search guards catch more bad actors than they false positive good actors.
> the safe search guards catch more bad actors than they false positive good actors.
Well, if the legal system used the same "Guilty until proven innocent" model, we would definitely "catch more bad actors than false positive good actors".
That's a tricky one, isn't it.
You do not want malware protection to be running at the speed of the legal system.
A better analogy, unfortunately for all the reasons it's unfortunate, is police: acting on the partial knowledge in the field to try to make the not-worst decision.
> people keep using Google's browser because the safe search guards catch more bad actors than they false positive good actors.
This is the first thing i disable in Chrome, Firefox and Edge. The only safe thing they do is safely sending all my browsing history to Google or Microsoft.
That's a reasonable thing for you to do (especially if you have some other signal source you use for malware protection), but HN readers are rarely representative of average users.
This feature is there for my mother-in-law, who never saw a popup ad she didn't like. You might think I'm kidding; I am not. I periodically had to go into her Android device and dump twenty apps she had manually installed from the Play Store because they were in a ring of promoting each other.
This is not an honest argument. Most people don't even know this web censorship mechanism exists until they see something (usually legit) blocked.
Do they then switch browsers in response?
downvoted for saying truth
many google employee is in here, so I dont expect them to be agree with you
Looking through some of the links in this post, I there are actually two separate issues here:
1. Immich hosts user content on their domain. And should thus be on the public suffic list.
2. When users host an open source self hosted project like immich, jellyfin, etc. on their own domain it gets flagged as phishing because it looks an awful lot like the publicly hosted version, but it's on a different domain, and possibly a domain that might look suspicious to someone unfamiliar with the project, because it includes the name of the software in the domain. Something like immich.example.com.
The first one is fairly straightforward to deal with, if you know about the public suffix list. I don't know of a good solution for the second though.
I don't think the Internet should be run by being on special lists (other than like, a globally run registry of domain names)...
I get that SPAM, etc., are an issue, but, like f* google-chrome, I want to browse the web, not some carefully curated list of sites some giant tech company has chosen.
A) you shouldn't be using google-chrome at all B) Firefox should definitely not be using that list either C) if you are going to have a "safe sites" list, that should definitely be a non-profit running that, not an automated robot working for a large probably-evil company...
> I don't think the Internet should be run by being on special lists
People are reacting as if this list is some kind of overbearing way of tracking what people do on the web - it's almost the opposite of that. It's worth clarifying this is just a suffix list for user-hosted content. It's neither a list of user-hosted domains nor a list of safe websites generally - it's just suffixes for a very small specific use-case: a company providing subdomains. You can think of this as a registry of domain sub-letters.
For instance:
- GitHub.io is on the list but GitHub.com is not - GitHub.com is still considered safe
- I self-host an immich instance on my own domain name - my immich instance isn't flagged & I don't need to add anything to the list because I fully own the domain.
The specific instance is just for Immich themselves who fully own "immich.cloud" but sublet subdomains under it to users.
> *if you are going to have a "safe sites" list"
This is not a safe sites list! This is not even a sites list at all - suffixes are not sites. This also isn't even a "safe" list - in fact it's really a "dangerous" list for browsers & various tooling to effectively segregate security & privacy contexts.
Google is flagging the Immich domain not because it's missing from the safe list but because it has legitimate dangers & it's missing from the dangerous list that informs web clients of said dangers so they can handle them appropriately.
Firefox and Safari also use the list. At least by default, I think you can turn it off in firefox. And on the whole, I think it is valuable to have _a_ list of known-unsafe sites. And note that Safe Browsing is a blocklist, not an allowlist.
The problem is that at least some of the people maintaining this list seem to be a little trigger happy. And I definitely thing Google probably isn't the best custodian of such a list, as they have obvious conflicts of interest.
>I think it is valuable to have _a_ list of known-unsafe sites
And how and who should define what is consider unsafe sites?
Ideally there should be several/many and the user should be able to direct their browser as to which they would like to use (or none at all)
> I think it is valuable to have _a_ list of known-unsafe sites
But this is not that list because sites are added using opaque automated processes that are clearly not being reviewed by humans - even if those sites have been removed previously after manual review.
It always has been run on special lists.
I've coined the phrase "Postel decentralization" to refer to things where people expect there to be some distributed consensus mechanism but it turned out that the design of the internet was to email Jon Postel (https://en.wikipedia.org/wiki/Jon_Postel) to get your name on a list. e.g. how IANA was originally created.
Oh god, you reminded me the horrors of hosting my own mailserver and all of the white/blacklist BS you have to worry about being a small operator (it's SUPER easy to end up on the blacklists, and is SUPER hard to get onto whitelists)
There are other browsers if you want to browse the web with the blinders off.
It's browser beware when you do, but you can do it.
You can turn it off in Chrome settings if you want.
If you have such strong feelings, you could always use vanilla chromium.
> I don't know of a good solution for the second though.
I know the second issue can be a legitimate problem but I feel like the first issue is the primary problem here & the "solution" to the second issue is a remedy that's worse than the disease.
The public suffix list is a great system (despite getting serious backlash here in HN comments, mainly from people who have jumped to wildly exaggerated conclusions about what it is). Beyond that though, flagging domains for phishing for having duplicate content smells like an anti-self-host policy: sure there's phishers making clone sites, but the vast majority of sites flagged are going to be legit unless you employ a more targeted heuristic, but doing so isn't incentivised by Google's (or most company's) business model.
> When users host an open source self hosted project like immich, jellyfin, etc. on their own domain...
I was just deploying your_spotify and gave it your-spotify.<my services domain> and there was a warning in the logs that talked about thud, linking the issue:
https://github.com/Yooooomi/your_spotify/issues/271
That means the Safe Browsing abuse could be weaponized against self-hosted services, oh my...
New directive from the Whitehouse. Block all non approved sites. If you don't do it we will block your merger etc...
Yeah it's only time until someone in power will realize there is already a mechanism for global web censorship that they can make use of.
The second is a real problem even with completely unique applications. If they have UI portions that have lookalikes, you will get flagged. At work, I created an application with a sign-in popup. Because it's for internal use only, the form in the popup is very basic, just username and password and a button. Safe Browsing continues to block this application to this day, despite multiple appeals.
Even the first one only works if there's no need to have site-wide user authentication on the domain, because you can't have a domain cookie accessible from subdomains anymore otherwise.
The issue isn't the user-hosted content - I'm running a release build of Immich on my own server and Google flagged my entire domain.
Is it on your own domain?
Yes, my own domain.
[dead]
Is the subdomain named immich or something more general?
The subdomain is "immich", which has crossed my mind as a potential flagging characteristic.
Thanks for the datapoint. I agree with sibling that it shouldn't be a problem, but am glad to discover from this thread that it may be.
Don't accept that rhetoric. Google shouldn't get to decide how you can design your own website.
They aren't hosting user content; it was their pull request preview domains that was triggering it.
This is very clearly just bad code from Google.
Or anticompetitive behavior.
I thought this story would be about some malicious PR that convinced their CI to build a page featuring phishing, malware, porn, etc. It looks like Google is simply flagging their legit, self-created Preview builds as being phishing, and banning the entire domain. Getting immich.cloud on the PSL is probably the right thing to do for other reasons, and may decrease the blast radius here.
The root cause is bad behaviour by google. This is merely a workaround.
[flagged]
Please point me to where GoDaddy or any other hosting site mentions public suffix, or where Apple or Google or Mozilla have a listing hosting best practices that include avoiding false positives by Safe Browsing…
>GoDaddy or any other hosting site mentions public suffix
They don't need to mention it because they handle it on behalf of the client. Them recommending best practices like using separate domains makes as much sense as them recommending what TLS configs to use.
>or where Apple or Google or Mozilla have a listing hosting best practices that include avoiding false positives by Safe Browsing…
Since were those sites the go to place to learn how to host a site? Apple doesn't offer anything related to web hosting besides "a computer that can run nginx". Google might be the place to ask if you were your aunt and "google" means "internet" to her. Mozilla is the most plausible one because they host MDN, but hosting documentation on HTML/CSS/JS doesn't necessarily mean they offer hosting advice, any more than expecting docs.djangoproject.com to contain hosting advice.
The underlying question is how are people supposed to know about this before they have a big problem?
[flagged]
Nothing in this article indicates UGC is the problem. It's that Google thinks there's an "official" central immich and these instances are impersonating it.
What malicious UGC would you even deliver over this domain? An image with scam instructiins? CSAM isn't even in scope for Safe Browsing, just phishing and malware.
It's not a "service" at all. It's Google maliciously inserting themselves into the browsing experience of users, including those that consciously choose a non-Google browser, in order to build a global web censorship system.
>You might not think it is, but internet is filled utterly dangerous, scammy, phisy, malwary websites
Google is happy to take their money and show scammy ads. Google ads are the most common vector for fake software support scams. Most people google something like "microsoft support" and end up there. Has Google ever banned their own ad domains?
Google is the last entity I would trust to be neutral here.
The argument would work better if Google wasn't the #1 distributor of scams and malware in the world with adsense. (Which strangely isn't flagged by safe browsing, maybe a coincidence)
[flagged]
> Imagine defending the most evil, trillion dollar corp
Hyperbole much?
Don't forget to get your worthless fiat pay check from Google adsense for a successful shilling campaign!
Not at all.
[flagged]
What is Safari getting by using Safe Browsing?
Is this a rhetoric question? Safari is just a middleman. G offers seemingly free services in exchange of your data and in order to get a market monopoly. Then they can sell you to their advertisers, squeeze out the competition and become the only Sheriff in town. How many free lunches you have got in your career?
”Competition is for losers.” -Peter Thiel
[flagged]
You should not be downvoted. Either HN has had an influx of ignorant normies or it's google bots attacking any negative comments
People working for famous adtech companies don't like it when people like op burst their bubble. I myself don't like it one bit - keep on changing the world you beautiful geniuses!
Exactly! Most of HN users work for "big tech" and are complete sell outs to their corporate overlords. Majority of them are to blame for the current bloated state of the web along with excessive mass surveillance and anti-privacy state we are in
HN is extremely tone-policed. Lines like "holy shit look in a mirror" are likely to attract downvotes because of their form, with no other factors being considered.
It's full of people described in this blog post [1]. As it concludes, GTFO! Flagging is the IRL equivalent of crying to your superior instead of actually having an argument which is pathetic
[1] - https://geohot.github.io/blog/jekyll/update/2025/10/15/pathe...
HN flagging is just shadow moderation.
I asked dang if I was shadowbanned from flagging. He said yes, if I flag something then it doesn't count because I flagged the wrong things in the past.
The conclusion is that flagging isn't really up to user choice, but is up to dang who decides which things should be flagged and which shouldn't. It's a bit like how on Reddit, the only comments you can see are the ones that agree with the moderators of that subreddit.
Is that actually relevant when only images are user content?
Normally I see the PSL in context of e.g. cookies or user-supplied forms.
> Is that actually relevant when only images are user content?
Yes. For instance in circumstances exactly as described in the thread you are commenting in now and the article it refers to.
Services like google's bad site warning system may use it to indicate that it shouldn't consider a whole domain harmful if it considers a small number of its subdomains to be so, where otherwise they would. It is no guarantee, of course.
Well, using the public suffix list _also_ isolates cookies and treats the subdomains as different sites, which may or may not be desirable.
For example, if users are supposed to log in on the base account in order to access content on the subdomains, then using the public suffix list would be problematic.
Cross domain identity management is a little extra work, but it's far from a difficult problem. I understand the objection to needing to do it when a shared cookie is so easy, but if you want subdomains to be protected from each other because they do not have shared responsibility for each other then it makes sense in terms of privacy & security that they don't automatically share identity tokens and other client-side data.
In another comment in this thread, it was confirmed that these PR host names are only generated from branches internal to Immich or labels applied by maintainers, and that this does not automatically happen for arbitrary PRs submitted by external parties. So this isn’t the use case for the public suffix list - it is in no way public or externally user-generated.
What would you recommend for this actual use case? Even splitting it off to a separate domain name as they’re planning merely reduces the blast radius of Google’s false positive, but does not eliminate it.
If these are dev subdomains that are actually for internal use only, then a very reliable fix is to put basic auth on them, and give internal staff the user/password. It does not have to be strong, in fact it can be super simple. But it will reliably keep out crawlers, including Google.
They didn't say that these are actually for internal use only. They said that they are generated either from maintainers applying labels (as a manual human decision) or from internal PR branches, but they could easily be publicly facing code reviews of internally developed versions, or manually internally approved deployments of externally developed but internally reviewed code.
None of these are the kind of automatic user-generated content that the warning is attempting to detect, I think. And requiring basic auth for everything is quite awkward, especially if the deployment includes API server functionality with bearer token auth combined with unauthenticated endpoints for things like built-in documentation.
How does the PSL make any sense? What stops an attacker from offering free static hosting and then making use of their own service?
I appreciate the issue it tries to solve but it doesn't seem like a sane solution to me.
PSL isn't a list of dangerous sites per-se.
Browsers already do various levels of isolation based on domain / subdomains (e.g. cookies). PSL tells them to treat each subdomain as if it were a top level domain because they are operated (leased out to) different individuals / entities. WRT to blocking, it just means that if one subdomain is marked bad, it's less likely to contaminate the rest of the domain since they know it's operated by different people.
Marking for cookie isolation makes sense, but could be done more effectively via standardized metadata sent by the first party themselves rather than a centralized list maintained by a third party.
Informing decisions about blocking doesn't make much sense (IMO) because it's little more than a speed bump for an attacker. Certainly every little bit can potentially help but it also introduces a new central authority, presents an additional hurdle for legitimate operators, introduces a number of new failure modes, and in this case seems relatively trivial for a determined attacker to overcome.
This is not about user content, but about their own preview environments! Google decided their preview environments were impersonating... Something? And decided to block the entire domain.
I think this only is true if you host independent entities. If you simply construct deep names about yourself with demonstrable chain of authority back, I don't think the PSL wants to know. Otherwise there is no hierarchy the dots are just convenience strings and it's a flat namespace the size of the PSLs length.
Aw. I saw Jothan Frakes and briefly thought my favorite Starfleet first officer's actor had gotten into writing software later in life.
Does Google use this for Safe Browsing though?
Looks like it? https://developers.google.com/safe-browsing/reference/URLs.a...
Oh - of course this is where I find the answer why there's a giant domain list bloating my web bundles (tough-cookie/tldts).
There is no law appointing that organization as a world wide authority on tainted/non tainted sites.
The fact it's used by one or more browsers in that way is a lawsuit waiting to happen.
Because they, the browsers, are pointing a finger to someone else and accusing them of criminal behavior. That is what a normal user understands this warning as.
Turns out they are wrong. And in being wrong they may well have harmed the party they pointed at, in reputation and / or sales.
It's remarkable how short sighted this is, given that the web is so international. Its not a defense to say some third party has a list, and you're not on it so you're dangerous
Incredible
I love all the theoretical objections to something that has been in use for nearly 20 years.
As far as I know there is currently no international alternative authority for this. So definitely not ideal, but better than not having the warnings.
Yes but that's not a legal argument.
You're honor, we hurt the plaintiff because it's better than nothing!
True, and agreed that lawsuits are likely. Disagree that it's short-sighted. The legal system hasn't caught up with internet technology and global platforms. Until it does, I think browsers are right to implement this despite legal issues they might face.
In what country hasn't the legal system caught up?
The point I raise is that the internet is international. There are N legal systems that are going to deal with this. And in 99% of them this isn't going to end well for Google if plaintiff can show there are damages to a reasonable degree.
It's bonkers in terms of risk management.
If you want to make this a workable system you have to make it very clear this isn't necessarily dangerous at all, or criminal. And that a third party list was used, in part, to flag it. And even then you're impeding visitors to a website with warnings without any evidence that there is in fact something wrong.
If this happens to a political party hosting blogs, it's hunting season.
I meant that there is no global authority for saying which websites are OK and which ones are not. So not really that the legal system in specific countries have not caught up.
Lacking a global authority, Google is right to implement a filter themselves. Most people are really really dumb online and if not as clearly "DO NOT ENTER" as now, I don't think the warnings will work. I agree that from a legal standpoint it's super dangerous. Content moderation (which is basically what this is) is an insanely difficult problem for any platform.
The alternative is to not do this.