Hacker News

Big fan of reader mode. For me, a direction better than llms.txt would be to encourage sites to improve their markup (think semantic web era) so agents could get the text version from that the way reader mode does. Would achieve the same thing - save tokens.

This isn't difficult and I think the reason it hasn't been done is that publishers want clicks and ad views. Which begs the question: why would they start doing it for agents?

0-_-0 12 hours ago [ - ]

Agents don't buy stuff they see in an ad

Retr0id 11 hours ago [ - ]

So why serve them at all?

Gigachad 11 hours ago [ - ]

If your website itself is advertising a product or service you sell you would still want LLMs to see and fetch it. If you are a news site, blog, or any other website that doesn’t exist to sell something, you are only harmed by ai agents.

Retr0id 11 hours ago [ - ]

In those situations you wouldn't have ads on the human version of the site either, surely?

mcmcmc 7 hours ago [ - ]

Sure, if it’s paywalled. Web hosting isn’t free

fullstackchris 9 hours ago [ - ]

modern agents already do this via content negotiation and will attempt to retrieve the markdown version of a given site

https://www.sanity.io/learn/course/markdown-routes-with-next...

k1m 8 hours ago [ - ]

But that isn't that different from requesting the llms.txt version. Why not just make it so the useful content you want the LLM to focus on is easily retrievable from the same HTML the user's browser gets?

The sanity.io page writes:

> serving agents a bunch of HTML might just bloat their context window.

That's only true if you assume the the agent can't extract the useful text before it goes into the model as tokens. Your browser's reader mode uses heuristics to identify what the actual content is in a large HTML response and strips away the rest.

To me this is a far better approach than worrying about an llms.txt files or looking at HTTP headers to see if markdown is preferred. Such efforts could easily be directed at ensuring the useful content on your site carries the appropriate markup for an agent or any other tool to extract it. And it would require less work to implement for the publisher of the content.