> If you feed it the entire dataset before 1905, LLMs aren't going to come up with general relativity.
Link?
You don't need a source for that, an LLM with such little data is barely able to form proper sentences.
> an LLM with such little data
There is a mountain of data pre-1905. Certainly enough to train a decent 30B parameter model.
Now, digitizing & OCRing all of that data... THAT is a challenge.
You don't need a source for that, an LLM with such little data is barely able to form proper sentences.
> an LLM with such little data
There is a mountain of data pre-1905. Certainly enough to train a decent 30B parameter model.
Now, digitizing & OCRing all of that data... THAT is a challenge.