artificial

Is there a way for a Speech-to-Text model to differentiate between speakers?

November 14, 2023 November 14, 2023

For instance, if I'm recording an interview between two people, and I have something like Whisper recording the discussion, can it break out the dialogue between the speakers? Seems like this would be a fairly simple feature, but I'm not sure if it exists.

Doesn't have to be Whisper per se, but is there a known S2T model or solution for this?

submitted by /u/jrstelle
[link] [comments]