Compare two audio files with persons speaking and compute the similarity score

Question

Big picture: Trying to identify proxy frauds in video interviews. I have video clips of interviews. Each person has 2 or more interviews. As a first step I am trying to extract the audio from the interviews and trying to match them and identify if audio is from the same person. I used python library librosa to parse the audio

Accepted Answer

The task of identifying who is talking is called Speaker Identification.Checking whether two audio clips have the same speaker Speaker Verification. If there are multiple speakers in dialog, then it may also be relevant to do Speaker Diarization, finding out who-talks-when. That would enable focus on the interview subject and not the interviewer.Speaker recognition tasks like these are best solved with a deep neural network, as it is quite difficult task to separate the speaker from the words that are spoken. The models generally output a speaker embedding &#8211; a vector representation that encodes similarity of different person&#8217;s speech. Then one can apply a simple similarity metric on this representation, such as cosine distance.There are pretrained models available for this. For example in pyannote-audio and in SpeechBrain.

Advertisement

Answer