Hacker News

I really don't understand why a zip file isn't a good solution here. Just because is requires "special" zip software on the server?

gwern 3 hours ago [ - ]

> Just because is requires "special" zip software on the server?

Yes. A web browser can't just read a .zip file as a web page. (Even if a web browser decided to try to download, and decompress, and open a GUI file browser, you still just get a list of files to click.) Therefore, far from satisfying the trilemma, it just doesn't work.

And if you fix that, you still generally have a choice between either no longer being single-file or efficiency. (You can just serve a split-up HTML from a single ZIP file with some server-side software, which gets you efficiency, but now it's no longer single-file; and vice-versa. Because if it's a ZIP, how does it stop downloading and only download the parts you need?)

spankalee 2 hours ago [ - ]

We're talking about servers here - the article specifically said that one of the requirements was no special _server_ software, and a web server almost certainly has zip (or tar) installed. These gwtar files don't work without a server apparently either.

gwern an hour ago [ - ]

I'm not following your point here. Yes, a web server (probably) has access to zip/tar utilities, but so what? That doesn't automagically make a random .zip jump through hoops to achieve anything beyond 'download like a normal binary asset'. That's what a ZIP file does. Meanwhile, Gwtar works with any server out of the box: it is just a HTML file using a pre-existing HTTP standardized functionality, and works even if the server declines to support range requests for some wacky reason like undocumented Cloudflare bugs, and downgrades RANGE to GET. (It just loses efficiency, but it still works, you know, in the way that a random .zip file doesn't work at all as a web page.) You can upload a Gwtar to any HTTP server or similar thing like an AWS bucket and it will at least work, zero configuration or plugins or additional executables or scripting.

Now, maybe you mean something like, 'a web server could additionally run some special CGI software or a plugin or do some fancy Lua scripting in order to munge a ZIP and split it up on the fly so as to do something like serve it to clients as a regular efficient multi-file HTML page'. Sure. I already cover that in the writeup, as we seriously considered this and got as far as writing a Lua nginx script to support special range requests. But then... it's not single-file. It's multi-file - whatever the additional special config file, script, plugin, or executable is.

newzino 5 hours ago [ - ]

Zip stores its central directory at the end of the file. To find what's inside and where each entry starts, you need to read the tail first. That rules out issuing a single Range request to grab one specific asset.

Tar is sequential. Each entry header sits right before its data. If the JSON manifest in the Gwtar preamble says an asset lives at byte offset N with size M, the browser fires one Range request and gets exactly those bytes.

The other problem is decompression. Zip entries are individually deflate-compressed, so you'd need a JS inflate library in the self-extracting header. Tar entries are raw bytes, so the header script just slices at known offsets. No decompression code keeps the preamble small.

fluidcruft 5 hours ago [ - ]

You can also read a zip sequentially like a tar file. Some info is in the directory only but just for getting file data you can read the file records sequentially. There are caveats about when files appear multiple times but those caveats also apply to processing tar streams.