ElevenLabs has a product where the same voice can be used to speak multiple languages with the correct intonation and accent of a native speaker in each language. Are there any good journal / arxiv articles of how something like this can be done, and perhaps more importantly, how to approach the training since it’s nearly impossible to find a dataset of one speaker with a native accent in multiple languages?
[link] [comments]