Hacker News

I'm impressed by Astro's collection of various kinds of llms.txt https://docs.astro.build/llms.txt

Which of these have you used and how are they useful to you?

Do you think this is relevant at earlier stages of a project or only once you have tons and tons of docs?

My instinct is that many LLMs.txt become less relevant over time as AI tokens become cheaper and context becomes longer.

koakuma-chan 3 days ago [ - ]

Keep in mind that I am a dashboard copy-and-paste workflow user, so the following may not be the same for Cursor users or Claude Code users.

> Which of these have you used and how are they useful to you?

llms-full.txt is generally not useful to me, because they are generally too big and consume too many tokens. For example, Next.js has an llms-full.txt[0] which is, IIRC, around 800K tokens. I don't know how this was intended to be used. I think llms-full.txt should look like Astro's /_llms-txt/api-reference.txt, more on it later.

[0]: https://nextjs.org/docs/llms-full.txt

Regarding llms.txt, I think there is some ambiguity because the way they look varies in my experience, but the most common ones are those that look like this[1] (i.e., a list of URLs), and I consider them moderately useful. My LLM cannot read URLs, but, what I do is, in that llms.txt, I look for files that are relevant to what I am doing, and just `curl -LO` them to a dedicated folder in my project (this kind of llms.txt usually lists LLM-friendly .md files). Subsequently, those files I downloaded are included in the context.

[1]: https://bun.sh/llms.txt

Now, what really impressed me is Astro's llms-small.txt, which, to be honest, still looks a little too big and appears to still contain some irrelevant stuff like "Editor setup," however, I think this is already small enough for it to be directly included in the prompt without any kind of additional preprocessing. I haven't seen anyone else do this (llms-small.txt) before, even though I think this is a pretty low hanging fruit.

But Astro actually has something that's, in my opinion, even better: /_llms-txt/api-reference.txt[2]. This appears to be just the API reference without any unnecessary data, and it even includes a list of common errors (something I have to maintain myself for other things, so that the LLM doesn't keep making same mistakes over and over again). This looks perfect for my dashboard copy and paste workflow, though I haven't actually tested yet (because I just found this).

[2]: https://docs.astro.build/_llms-txt/api-reference.txt

> Do you think this is relevant at earlier stages of a project or only once you have tons and tons of docs?

I think this is definitely relevant at early stages, and for as long as LLMs don't have your APIs in their own knowledge (you can look for "knowledge cut-off" date in model descriptions). I would go as far as saying that this is very important because if you don't have this, and LLMs don't have your APIs in their own knowledge, it will be a pain to use your library/SDK/whatever when coding with LLMs.

Tips:

- Maintain an LLM-friendly list of errors that LLMs commonly make when using your thing. For example, in Next.js the `headers` function, as of recently, returns a Promise (it used to return headers directly), and therefore you now have to `await` it, and it's extremely common for LLMs to not include an `await`, which prevents your app from working, and you have to waste time fixing this. It would be really good if Next.js provided an LLM-friendly list of common errors like this one, and there are many others.

- Maintain an LLM-friendly list of guidelines/best practices. This can (for example, but not limited) be used to discourage LLMs from using deprecated/whatever APIs that new apps should not use. Example: in Angular, you can inject things into your components by defining constructor parameters, but this is apparently an old way or whatever. Now they want you to use the `inject` function. So on their website they have LLM prompts[3] which list guidelines/best practices, including using the `inject` function.

[3]: https://angular.dev/ai/develop-with-ai

> My instinct is that many LLMs.txt become less relevant over time as AI tokens become cheaper and context becomes longer.

I don't think llms.txt will become less relevant anywhere in the near future. I think, as LLM capabilities increase, you will just be able to put more llms.txt into your context. But as of right now, in my experience, if your prompt is longer than 200K~ tokens, the LLM performance degrades significantly. Keep in mind that, (though this is just my mental model, and I am not an AI scientist), just because the LLM description says, for example, up to 1M tokens context, that doesn't necessarily mean that its "attention" spans across 1M tokens, and even though you can feed 1M tokens into, say, Gemini 2.5 Pro, it doesn't work well.

colonCapitalDee 3 days ago [ - ]

With Claude Code I've had great success with maintaining a references folder of useful docs, cloned repos, downloaded html, etc. Claude Code is able to use its filesystem traversal tools to explore the library, it works very well. It's amazing to be able to say something like "Figure out how to construct $OBSCURE_TYPE. This is a research task; use the library" and it nails it.

nxobject 3 days ago [ - ]

I'm curious – how are you organizing the folder/instructing Claude Code on its layout? I'm trying to get an LLM-aided dev environment set up for an ancient application dev framework, and I'm resigned to the fact that I'm going to have to curate my own "AI source material" for it.

koakuma-chan 3 days ago [ - ]

That’s not efficient though when you do real work. I prefer to manage the context myself and just copy and paste everything that’s needed into the dashboard, as opposed to waiting for Claude to read all it needs, which is longer and more expensive.

benswerd 20 hours ago [ - ]

> Maintain an LLM-friendly list of errors that LLMs commonly make when using your thing. For example, in Next.js the `headers` function, as of recently, returns a Promise (it used to return headers directly), and therefore you now have to `await` it, and it's extremely common for LLMs to not include an `await`, which prevents your app from working, and you have to waste time fixing this. It would be really good if Next.js provided an LLM-friendly list of common errors like this one, and there are many others.

My team is actually working on a total revamp of our errors for AI driven debugging. Will write a blog post when its landed more thoroughly, but the current thing we're working on is:

1. Replace our error codes on individual routes with a global list of errors, so every error on every route maps to a global id of this-route-this-error. When you receive one you get not only the id but a link to (2)

2. Generate docs pages for every single one of these errors that note where they come from, metadata, over time we also want to add notes to all these pages about what these errors are, how people have triggered them in the past and how people have solved them.

3. Create what we are currently calling a "DevOps MCP" for our cloud. This MCP is not like a full control MCP, its almost completely read only, and its for pulling logs and seeing whats going on. The usecase is primarily — AI knows API request failed, AI looks at recent API requests, finds the failed one, can then pull the error code, then pull the docs to solve it, then solve it.

We're not sure exactly how this system will/should look and atm we're working on #1 of these 3.

I'm bearing on the concept of an llm-errors-full.txt for these, as I worry it would create the same effect as a `cat-facts.txt` in practice — by being hyper aware of all the errors it would fixate on them and write code in ways to solve errors it never had. I think it would work much better as an MCP, or a list of codes that it debugs when they come up.

> I think this is definitely relevant at early stages, and for as long as LLMs don't have your APIs in their own knowledge (you can look for "knowledge cut-off" date in model descriptions). I would go as far as saying that this is very important because if you don't have this, and LLMs don't have your APIs in their own knowledge, it will be a pain to use your library/SDK/whatever when coding with LLMs.

I've been thinking a lot about the future of SDKs recently and I'm not sure where they go now. Notably, if AI thrives in consistency, then a lot of object models that my favorite SDKs do don't make sense anymore. Rather a more standardized interface for everything makes more sense because the AI could understand it better.

As noted in the post, I've tried (and failed) repeatedly to generate good SDK docs, and I'm not clear what the right solution for the future of SDKs is. I believe it needs more code generation, as my dream SDK docs for AI would include flattened type definitions for all the inputs with every single method, and I'm not sure exactly how to achieve that today.

Thoughts?

koakuma-chan 5 hours ago [ - ]

> 1. Replace our error codes on individual routes with a global list of errors, so every error on every route maps to a global id of this-route-this-error. When you receive one you get not only the id but a link to (2)

> 2. Generate docs pages for every single one of these errors that note where they come from, metadata, over time we also want to add notes to all these pages about what these errors are, how people have triggered them in the past and how people have solved them.

I don't like having LLMs follow links because that adds costs and latency. Imagine your conversation is already a few hundred thousand tokens, and, let's say, some error occurred, so you enter the error with an ID and a link to the generated docs into the chat. Now, something that needs to be noted is that the LLM may decide to not even open the link, it may decide to try and fix the error on its own, so it may need to be encouraged to open your links. Let's say the LLM did decide to open the link and made a corresponding tool call. At that point, you already spent a few hundred thousand tokens and a few seconds, just for the LLM to make the decision to read the docs. I think this can be optimized by inlining the docs into the response (use XML).

> 3. Create what we are currently calling a "DevOps MCP" for our cloud. This MCP is not like a full control MCP, its almost completely read only, and its for pulling logs and seeing whats going on. The usecase is primarily — AI knows API request failed, AI looks at recent API requests, finds the failed one, can then pull the error code, then pull the docs to solve it, then solve it.

I love this idea. Any time something goes wrong, the user, without the need to look through logs, can just ask the LLM to use the dev MCP server to find the error. I imagine this would greatly reduce the mental burden when encountering errors. But once again, I think there should be as much inlining as possible because indirections are expensive.

> As noted in the post, I've tried (and failed) repeatedly to generate good SDK docs, and I'm not clear what the right solution for the future of SDKs is. I believe it needs more code generation, as my dream SDK docs for AI would include flattened type definitions for all the inputs with every single method, and I'm not sure exactly how to achieve that today.

I ran into the same problem, and yeah, I haven't seen anyone make documentation generators for LLMs yet, so one way to achieve this is to write one yourself, and perhaps open source it. I would look for an existing parser implementation, e.g. if your language has a formatter, I would try and steal from there. If it's TypeScript, I would use swc's crates. Once you have that, collect API function signatures, type definitions, etc, and format as LLM-friendly markdown files.

imtringued 2 days ago [ - ]

llms.txt is a failure because it's designed for crawlers that want to collect bigger datasets instead of being designed for RAG.

What's actually needed is e.g. javadoc jars stored in a central repository, but in a more structured format than an html export.