It seems this repo only saves one web page?
What I'm implementing here is mirroring a whole website, with all its subpages, so you can browse it all offline. For example, all essays from paulgraham.com.
It seems this repo only saves one web page?
What I'm implementing here is mirroring a whole website, with all its subpages, so you can browse it all offline. For example, all essays from paulgraham.com.
Oh, I see. In that case, feature-wise, it is actually a modern alternative to HTTrack.
I think the misunderstanding stems from the browser's "Save As" reference in the description. It is misleading. You use "Save As" to save a single page, not an entire website.
Also, the description lacks a clear explanation of the project's purpose. It would be helpful to include a sentence explaining that the program downloads an entire website, not just a single page.
Singlefile supports scoped recursive crawls too: https://github.com/gildas-lormeau/single-file-cli#:~:text=an...
I highly recommend reading the singlefile source or https://archiveweb.page/ to see how they handle closed shadow DOMs, cross-origin iframes, websockets, media urls, deduping large assets, etc.
> For example, all essays from paulgraham.com
Not the same thing, but I made a clone of pg’s website which can be used for exactly that: https://github.com/shawwn/pg
https://shawwn.github.io/pg/
If you want to read all essays, just clone the repo and open any of the .html files. Or any of the .page files which generated them.
[flagged]
Um. Whose website are you on right now?
Don't come here to laugh but always great when it happens anyways.