Hacker News

This is like a broadband (white noise) EW jammer; i.e. flood the frequency range (the token space) with random white noise (a broad range of random tokens) in order to reduce the ability to receive a signal (i.e. information).

Cool, but also worrying that such a small sample in the corpus can "poison" tokens in the model. Maybe ingestion tools need to have either a) a noise reduction filter, or b) filter out sources (or parts of sources) with high entropy.