Tangentially related question. Has anyone analyzed if the content that is being converted could break the model.

So let's say you have a super dull pdf ( or even a scan ) that has the same line over and over again, could this get the model into one of those loops that just keep spewing nonsense.

And thinking that further, could someone prompt inject a model with a handwritten note that only gets "activated" once it's in the context?