Is there a way for a Speech-to-Text model to differentiate between speakers?
For instance, if I'm recording an interview between two people, and I have something like Whisper recording the discussion, can it break out the dialogue between the speakers? Seems like this would be a fairly simple feature, but I'm not sure i…