| Hey everyone, I wanted to share a project I've been working on: a complete S2ST pipeline that translates a source video (English) to a target language (Telugu) while preserving the speaker's voice and syncing the lips. telugu output with voice presrvation and lipsync Full Article/Write-up: medium The Tech Stack:
In my write-up, I go deep into the journey, including my failed attempt at a direct speech-to-speech model inspired by Translatotron and the limitations I found with traditional voice cloning. I'm a final-year student actively seeking research or ML engineering roles. I'd appreciate any technical feedback on my approach, suggestions for improvement, or connections to opportunities in the field. Open to collaborations as well! Thanks for checking it out. [link] [comments] |