Hacker News

> complete New York Times (pre-1930)

https://archive.org/search?query=title%3ANew+York+Times&sort...

> as a full PDF download set

I imagine it's possible to achieve this through torrents from Anna's, but you'd have to search and compile the list of all individual PDFs.

> something new with an AI-powered search

With enough time and willingness, someone could put all the old NYT issues through optical character recognition and convert them to text; then make it available to large language models for semantic search of some kind. Ideally public cultural funds could support the effort as academic research.

pkamb 2 days ago [ - ]

It just feels like the complete public domain New York Times should be a big deal. Why is it only available via individual issues in the Internet archive? Why hasn't every single story been cut out individually, fully OCR'd, so that it shows up as a top hit on Google? And do that for every public domain newspaper around the country, too.