If anyone was wondering ... it's racist

Unsurprisingly the texts written up until that time were dominated by such individuals which is tragic for LLM training if you think about it.

The voiceless groups or fringe opinions which we take as normative today do not appear.

Does this encourage us to write in the present such that we influence the models in perpetuity?

Voiceless groups do not appear in the training data? How could they, they are voiceless. You think the voiceless people are represented in todays training data? They cannot they are voiceless.

Nothing tragic about using data from a time period.

Common words used in 1900s are labeled racist now. I doubt anyone was wondering if they filtered those words for modern safe wordx.

I'd be more worried if words from that era were fully aligned with present day notions of morality. Wouldn't that indicate a certain stagnation & lack of progress?

Let us hope, 100 years from now, there will be people who look back unkindly on us.

[deleted]

one day we'll have SOTA models trained like this one and there's nothing you can do about it :^)

[flagged]