Hacker News

As I wrote in an earlier comment [1], you can take a sample in two parts, and the first part is the only one that needs to do a scan and, even then, if you index your weights (or are using a column store like Parquet), this scan can be very lightweight and ignore everything except the weights and primary keys.

[1] https://news.ycombinator.com/item?id=41906816