> https://github.com/DGoettlich/history-llms/blob/main/ranke-4...

Given the training notes, it seems like you can't get the performance they give examples of?

I'm not sure about the exact details but there is some kind of targetted distillation of GPT-5 involved to try and get more conversational text and better performance. Which seems a bit iffy to me.

Thanks for the comment. Could you elaborate on what you find iffy about our approach? I'm sure we can improve!

Well, it would be nice to see examples (or weights to be completely open) for the baseline model, without any GPT-5 influence whatsoever. Basically let people see what the "raw" output from historical texts is like, and for that matter actively demonstrating why the extra tweaks and layers are needed to make a useful model. Show, don't tell, really.