Hacker News

nextaccountic 3 months ago [ - ]

Is there anything more production grade built around the same idea of HTTP range requests like that sqlite thing? This has so much potential

Humphrey 3 months ago [ - ]

Yes — PMTiles is exactly that: a production-ready, single-file, static container for vector tiles built around HTTP range requests.

I’ve used it in production to self-host Australia-only maps on S3. We generated a single ~900 MB PMTiles file from OpenStreetMap (Australia only, up to Z14) and uploaded it to S3. Clients then fetch just the required byte ranges for each vector tile via HTTP range requests.

It’s fast, scales well, and bandwidth costs are negligible because clients only download the exact data they need.

https://docs.protomaps.com/pmtiles/

simonw 3 months ago [ - ]

PMTiles is absurdly great software.

Humphrey 3 months ago [ - ]

I know right! I'd never heard of HTTP Range requests until PMTiles - but gee it's an elegant solution.

keepamovin 3 months ago [ - ]

Hadn't seen PMTiles before, but that matches the mental model exactly! I chose physical file sharding over Range Requests on a single db because it felt safer for 'dumb' static hosts like CF. - less risk of a single 22GB file getting stuck or cached weirdly. Maybe it would work

hyperbolablabla 3 months ago [ - ]

My only gripe is that the tile metadata is stored as JSON, which I get is for compatibility reasons with existing software, but for e.g. a simple C program to implement the full spec you need to ship a JSON parser on top of the PMTiles parser itself.

seg_lol 3 months ago [ - ]

A JSON parser is less than a thousand lines of code.

Diti 3 months ago [ - ]

And where most of CPU time will be wasted in, if you care about profiling/improving responsiveness.

monerozcash 3 months ago [ - ]

At that point you're just io bound, no? I can easily parse json at 100+GB/s on commodity hardware, but I'm gonna have a much harder time actually delivering that much data to parse.

keepamovin 3 months ago [ - ]

What's a better way?

keepamovin 3 months ago [ - ]

How would you store it?

nextaccountic 3 months ago [ - ]

That's neat, but.. is it just for cartographic data?

I want something like a db with indexes

jtbaker 3 months ago [ - ]

Look into using duckdb with remote http/s3 parquet files. The parquet files are organized as columnar vectors, grouped into chunks of rows. Each row group stores metadata about the set it contains that can be used to prune out data that doesn’t need to be scanned by the query engine. https://duckdb.org/docs/stable/guides/performance/indexing

LanceDB has a similar mechanism for operating on remote vector embeddings/text search.

It’s a fun time to be a dev in this space!

nextaccountic 3 months ago [ - ]

> Look into using duckdb with remote http/s3 parquet files. The parquet files are organized as columnar vectors, grouped into chunks of rows. Each row group stores metadata about the set it contains that can be used to prune out data that doesn’t need to be scanned by the query engine. https://duckdb.org/docs/stable/guides/performance/indexing

But, when using this on frontend, are portions of files fetched specifically with http range requests? I tried to search for it but couldn't find details

jtbaker 3 months ago [ - ]

Yes, you should be able to see the byte range requests and 206 responses from an s3 compatible bucket or http server that supports those access patterns.

simonw 3 months ago [ - ]

There was a UK government GitHub repo that did something interesting with this kind of trick against S3 but I checked just now and the repo is a 404. Here are my notes about what it did: https://simonwillison.net/2025/Feb/7/sqlite-s3vfs/

Looks like it's still on PyPI though: https://pypi.org/project/sqlite-s3vfs/

You can see inside it with my PyPI package explorer: https://tools.simonwillison.net/zip-wheel-explorer?package=s...

simonw 3 months ago [ - ]

I recovered it from https://archive.softwareheritage.org/browse/origin/directory... and pushed a fresh copy to GitHub here:

https://github.com/simonw/sqlite-s3vfs

This comment was helpful in figuring out how to get a full Git clone out of the heritage archive: https://news.ycombinator.com/item?id=37516523#37517378

Here's a TIL I wrote up of the process: https://til.simonwillison.net/github/software-archive-recove...

QuantumNomad_ 3 months ago [ - ]

I also have a locally cloned copy of that repo from when it was on GitHub. Same latest commit as your copy of it.

From what I see in GitHub in your copy of the repo, it looks like you don’t have the tags.

Do you have the tags locally?

If you don’t have the tags, I can push a copy of the repo to GitHub too and you can get the tags from my copy.

simonw 3 months ago [ - ]

I don't have the tags! It would be awesome if you could push that.

QuantumNomad_ 3 months ago [ - ]

Uploaded here:

https://github.com/Quantum-Nomad/sqlite-s3vfs

simonw 3 months ago [ - ]

Thanks for that, though actually it turns out I had them after all - I needed to run:

  git push --tags origin

QuantumNomad_ 3 months ago [ - ]

All the better :)

bspammer 3 months ago [ - ]

Doing all this in an hour is such a good example of how absurdly efficient you can be with LLMs.

socialcommenter 3 months ago [ - ]

From reading the TIL, it doesn't appear as if Simon used LLM for a large portion of what he did; only the initial suggestion to check the archive, and the web tool to make his process reproducible. Also, if you read the script from his chat with Claude code, the prompt really does the heavy lifting.

Sure, the LLM fills in all the boilerplate and makes an easy-to-use, reproducible tool with loads of documentation, and credit for that. But is it not more accurate to say that Simon is absurdly efficient, LLM or sans LLM? :)

AceJohnny2 3 months ago [ - ]

didn't you do something similar for Datasette, Simon?

simonw 3 months ago [ - ]

Nothing smart with HTTP range requests yet - I have https://lite.datasette.io which runs the full Python server app in the browser via WebAssembly and Pyodide but it still works by fetching the entire SQLite file at once.

AceJohnny2 3 months ago [ - ]

oh! I must've been confused with your TIL where you linked to an explainer of this technique

https://simonwillison.net/2021/May/2/hosting-sqlite-database...

https://phiresky.github.io/blog/2021/hosting-sqlite-database...

https://news.ycombinator.com/item?id=27016630

billywhizz 3 months ago [ - ]

i played around with this a while back. you can see a demo here. it also lets you pull new WAL segments in and apply them to the current database. never got much time to go any further with it than this.

https://just.billywhizz.io/sqlite/demo/#https://raw.githubus...

3 months ago [ - ]

[deleted]

ericd 3 months ago [ - ]

This is somewhat related to a large dataset browsing service a friend and I worked on a while back - we made index files, and the browser ran a lightweight query planner to fetch static chunks which could be served from S3/torrents/whatever. It worked pretty well, and I think there’s a lot of potential for this style of data serving infra.

__turbobrew__ 3 months ago [ - ]

gdal vsis3 dynamically fetches chunks of rasters from s3 using range requests. It is the underlying technology for several mapping systems.

There is also a file format to optimize this https://cogeo.org/

omneity 3 months ago [ - ]

I tried to implement something similar to optimize sampling semi-random documents from (very) large datasets on Huggingface, unfortunately their API doesn't support range requests well.

mootothemax 3 months ago [ - ]

This is pretty much well what is so remarkable about parquet files; not only do you get seekable data, you can fetch only the columns you want too.

I believe that there are also indexing opportunities (not necessarily via eg hive partitioning) but frankly - am kinda out of my depth pn it.

6510 3 months ago [ - ]

I want to see a bittorrent version :P

nextaccountic 3 months ago [ - ]

Maybe webtorrent-based?

tlarkworthy 3 months ago [ - ]

Parquet/iceberg