Is your RAG manpages thing on github somewhere? I was thinking about doing something like that (it's high on my to-do list but I haven't actually done anything with llms yet.)

I'll get it up soon, probably should. This little snippet will help you though:

   $ man --html="$(which markitdown)" <man page>
That goes man -> html -> markdown which is not only token efficient but also llms are pretty good at creating hierarchies from markdown

I bet you could do the same thing with pandoc and skip serializing to HTML entirely.