<span class="vcard">/u/jrstelle</span>
/u/jrstelle

Is there a way for a Speech-to-Text model to differentiate between speakers?

For instance, if I'm recording an interview between two people, and I have something like Whisper recording the discussion, can it break out the dialogue between the speakers? Seems like this would be a fairly simple feature, but I'm not sure i…