Although thus isn’t directly related to the idea in the article, I’m reminded that one of the most effective hacks I’ve found for working with ChatGPT has been to attach screen shots of files rather than the files themselves. I’ve noticed the model will almost always pay attention to an image and pull relevant data out of it, but it requires a lot of detailed prompting to get it to reliably pay attention to text and pdf attachments instead of just hallucinating their contents.
Hmm. Yesterday I stuck a >100 page PDF into a Claude Project and asked Claude to reference a table in the middle of it (I gave page numbers) to generate machine readable text. I watched with some bafflement as Claude convinced itself that the PDF wasn’t a PDF, but then it managed to recover all on its own and generated 100% correct output. (Well, 100% correct in terms of reading the PDF - it did get a bit confused a few times following my instructions.)
PDF is a trash format
This is probably because your provider is generating embeddings over the document to save money, and then simply running a vector search across it instead of fitting it all in context.