Are there any articles on creating a TTS model where one voice can speak multiple languages?
Are there any articles on creating a TTS model where one voice can speak multiple languages?

Are there any articles on creating a TTS model where one voice can speak multiple languages?

ElevenLabs has a product where the same voice can be used to speak multiple languages with the correct intonation and accent of a native speaker in each language. Are there any good journal / arxiv articles of how something like this can be done, and perhaps more importantly, how to approach the training since it’s nearly impossible to find a dataset of one speaker with a native accent in multiple languages?

submitted by /u/ziapelta
[link] [comments]