This is neat. Most eBook → audio tools I’ve tried either butcher formatting or feel clunky, so a “just throw it in and listen” approach is refreshing. Curious how well it handles footnotes/sidebars in practice — do you skip them entirely or try to inline them?
thank you! yes footnotes, page numbers, weird characters and other extra bits that interrupt the tts is one of my biggest gripes about ebook readers with audio. this was my biggest motivation to build this.
those elements are basically removed from the text, because i wanted to keep the reading experience very simple in a sense, but i should maybe add an option to only skip them with the tts.
though, i have to say lue is still not absolutely 100% perfect with this. epubs were easier because many of them are full of standardized xhtml/html tags. but there is still a lot of manual regexing in my code which i took a lot of time with, testing filters with 100 or so books to find the best and most general ones that are able to clean up most books.
pdfs are whole another beast because they are basically blocks of text without any tags so it's much harder to format and clean them up. the way i solved this was to use positional filters which are of course not bullet proof because people scan pdf's in different ways and pages are laid out differently. so i tried to find the best average placements of these elements (top 10% and bottom 10%) and there's always the option not enable this filtering or change the values. i'm still trying to look maybe a better way to solve this.