Does it turn it into spoken word or an audiobook? Because good audiobooks often have voice actors that read the characters with different emphasis and dialects. I imagine tools like chatgpt could do this for a few sentences but what about an 8-20 hour audiobook?
I think there are still basic hurdles to take before we can go epub to audiobook in a quality that can compete with current state of the art.
Or am I missing something?
Elevenlabs has a feature for a "full cast"-type generation, where different characters will get different voices. It's certainly not automatically sensitive to dialect though.
It's probably possible with current systems to do though. I believe there are TTS systems that can use context/prompting to change emphasis and other speech qualities, though I'm not sure how reliably.
I’m sure it’s doable. I think you’d want to break it into a few discrete steps for the best quality. First process the book and identify key info like genre, tone, etc. Use that to determine the best voice(s) and reading style, assign actors for multiple characters/subjects. Maybe output some examples to spot check for approval. Tweak based on that then generate the audio. Prob a couple other steps in there and maybe a bit of custom work to optimize in key areas. If someone wants to do this as a side project I can help scope the architecture and process but I don’t want to code it. :p
Have you heard results from it? How does it know for example, when there is a romantic scene in the book, which voice to read out as?
It's definitely an exited voice, but is it read out as in a battle or as in a romantic scene?
I don't think they do it automatically, though. I think you need to piece apart the transcript in their tool to decide which voice to use where.
Is it open source?
[flagged]
I don't see a link to Elevenlabs. So I'll post one: https://elevenlabs.io/ . It doesn't look open source to me.
Thanks for the link.
Not sure why my question got downvoted. We were talking about Abogen which is FOSS.
There are a few character voices that also can be mixed using the mixer, achieving different nuances. You can then write your own code to use different voices for different characters.
> Because good audiobooks often have voice actors that read the characters with different emphasis and dialects.
I actually hate this. I like quotes to be read with the tone and inflection implied by the context but I don't like the different voices.
I'm with you. It's as if a book decided to use a different font for each character's speech. It's distracting, not helpful.