Capcat is a python based CLI/TUI FOSS utility for Ethical archiving of given website or RSS source.

The github repo: https://github.com/stayukasabov/capcat

It is generated with NLP, context-engineering, spec-driven development and LLMs.

Fully functional at http://capcat.org, with instructions for usage and documentation.

The project started from my personal needs of simple archiving with structure and moved to product design/MVP exercise.

I am longtime HN user, and the most value I got in years of reading is always deep in the comments section.

For HN Capcat uses the official API, with rate-limits, identifies honesty with clear user agent and skips paywalled content. All usernames are anonymized with a link to the user profile.

The content is delivered in Markdown format (Obsidian ready with frontmatter) and optional HTML with dark/light themes. Every source has its own YAML config file for separate control and PDF size limiter. In the folder users have an option to change the HTML theme with a minimal CSS design-system.

Please consider that my focus as a product designer is in UX. I have enough of a general culture and software development principles but the code is not validated, and my decisions in building may have a limitation.

Feedback is welcomed. Thanks in advance.

This looks like an interesting way to scrape a website for AI to use as reference.

Yes, I created this targeting my personal archiving needs, Developers, Design Engineers, ML engineers.

Markdown output has frontmatter for categorising data. HTML for browser consumption is optional.

The goal was to have a simple TUI for quick fetching, but the CLI can be scripted and extendable. All of this with focus on ethical scraping.

Commands are created following the standard from clig.dev.

Thanks for your response.