Elevenlabs has a feature for a "full cast"-type generation, where different characters will get different voices. It's certainly not automatically sensitive to dialect though.

It's probably possible with current systems to do though. I believe there are TTS systems that can use context/prompting to change emphasis and other speech qualities, though I'm not sure how reliably.

I’m sure it’s doable. I think you’d want to break it into a few discrete steps for the best quality. First process the book and identify key info like genre, tone, etc. Use that to determine the best voice(s) and reading style, assign actors for multiple characters/subjects. Maybe output some examples to spot check for approval. Tweak based on that then generate the audio. Prob a couple other steps in there and maybe a bit of custom work to optimize in key areas. If someone wants to do this as a side project I can help scope the architecture and process but I don’t want to code it. :p

Have you heard results from it? How does it know for example, when there is a romantic scene in the book, which voice to read out as?

It's definitely an exited voice, but is it read out as in a battle or as in a romantic scene?

I don't think they do it automatically, though. I think you need to piece apart the transcript in their tool to decide which voice to use where.

Is it open source?

[flagged]

I don't see a link to Elevenlabs. So I'll post one: https://elevenlabs.io/ . It doesn't look open source to me.

Thanks for the link.

Not sure why my question got downvoted. We were talking about Abogen which is FOSS.