What's the motivation for doing this in the browser? It seems like intentionally choosing a more difficult path to create an inferior result.
A native MacOS or Windows application could use the OCR facilities of the operating system and, in my experience, both produce results that are far better than Tesseract.
Generate the OCR on the fly, in the browser, when you do not have the proper OCR info. As someone that works on public web libraries, I see it useful (but wasteful)