Hacker News

ai-inquisitor 3 hours ago [ - ]

It's not doing that. If you look at the repository, it's adding a new commit with tiny parquet files every 5 minutes. This recent one only was a 20.9 KB parquet file: https://huggingface.co/datasets/open-index/hacker-news/commi... and the ones before it were a median of 5 KB: https://huggingface.co/datasets/open-index/hacker-news/tree/...

The bigger concern is how large the git history is going to get on the repository.

btown 2 hours ago [ - ]

I recall that this became a big problem for the Homebrew project in terms of load on the repo, to the extent that Github asked them not to recommend/default-enable shallow clones for their users: https://github.com/Homebrew/brew/issues/15497#issuecomment-1...

This is likely to be lower traffic, and the history should (?) scale only linearly with new data, so likely not the worst thing. But it's something to be cognizant of when using SCM software in unexpected ways!

roncesvalles 2 hours ago [ - ]

How would shallow clone be more stressful for GitHub than a regular clone?

enchilada 2 hours ago [ - ]

Shallow clones (and the resulting lack of shared history data) break many assumptions that packfile optimisations rely on.

vovavili 2 hours ago [ - ]

This makes more sense. I still wonder if the author isn't just effectively recreating Apache Iceberg manually here.

tomrod 2 hours ago [ - ]

Are they paying for the repo space, I wonder?

cyanydeez 2 hours ago [ - ]

someones paying to keep name dropping Iceberg(tm)