Yes it's a fair amount of data:
pdfs/ 12.5 GiB
pages/ 91.96 GiB (Each page as a .png)
text/ 365.03 MiB (Each page as text)
byte_files/ 55.98 GiB (The 1024x1024 tiles as .jpeg)
I had not heard of https://github.com/lovasoa/dezoomify-rs before, that's really cool!
I wonder how it would do with the djvu codec which tends to have been used specifically for archiving documents. I suppose it is best applied at source if the physical material is at hand.
Might still be worth taking a look at as an experiment since this codec separates text, background and images into different layers, even when converted from another format.