There's a similar but unreleased project here: https://github.com/DGoettlich/history-llms

I've been waiting for them to publish the 4B model for a while so I'm glad to have something similar to play with. I think I trust the Ranke-4B process a bit more, but that's partly because there aren't a lot of details in this report. And actually releasing a model counts for a whole lot.

One thing that I think will be a challenge for these models is achieving any sort of definite temporal setting. Unless the conversation establishes a clear timeframe, the model may end up picking a more or less arbitrary context, or worse, averaging over many different time periods. I think this problem is mostly handled by post-training in modern LLMs (plus the fact that most of their training data comes from a much narrower time range), but that is probably harder to accomplish while trying to avoid bias in the SFT and RL process.

I wonder if it would be possible to do something simple like prepending sentinel tokens with the year. Or, since they're training a model from scratch anyways, tweak the architecture to condition on a temporal embedding. That opens the door to cool stuff like: Generate a response from 2050.