Big picture: Trying to identify proxy frauds in video interviews. I have video clips of interviews. Each person has 2 or more interviews. As a first step I am trying to extract the audio from the interviews and trying to match them and identify if audio is from the same person. I used python library librosa t…
Tag: librosa
How to display audio at the right side of matplotlib
The following code display the image and audio in the top-bottom style: Here is the test code: Is it possible for changing the “top-bottom” style to “left-right” style for displaying the audio at the right side of the plt figure? Answer You can use a GridspecLayout which is similar to …
Why does multiplying audio signal amplitude by any coefficient doesn’t change it?
Suppose you have the following float32 audio representation loaded from any wav file using the librosa package: If you then will try to play this audio using, for example, a jupyter notebook, the following snippets sounds in the same way: Why does it happen that changing audio aptitude (if I correctly underst…
Python Tensorflow Shape Mismatch (WaveNet)
I was trying to run a WaveNet, which is specified in https://github.com/mjpyeon/wavenet-classifier/blob/master/WaveNetClassifier.py. Part of my code is as follows: Here, self.input_shape=X_train.shape and self.output_shape=(11,) It successfully printed out the model’s summary, but was outputting the fol…
Is my output of librosa MFCC correct? I think I get the wrong number of frames when using librosa MFCC
The signal is 1 second long with sampling rate of 16000, I compute 13 MFCC with 400 hop length. The output dimensions are (13,41). Why do I get 41 frames, isn’t it supposed to be (time*sr/hop_length)=40? Answer TL;DR answer Yes, it is correct. Long answer You are using a time-series as input (signal), w…
Librosa – Audio Spectrogram/Frequency Bins to Spectrum
I’ve read around for several days but haven’t been to find a solution… I’m able to build Librosa spectrograms and extract amplitude/frequency data using the following: However, I cannot turn the data in D and freq_bins back into a spectrum. Once I am able to do this I can convert the n…
Sound feature attributeError: ‘rmse’
In using librosa.feature.rmse for sound feature extraction, I have the following: It gives me: What’s the right way to get it? Sample file: https://www2.cs.uic.edu/~i101/SoundFiles/CantinaBand3.wav Answer I am guessing you are running one of the latest librosa. If you check the changelog for the 0.7, yo…
Why spectrogram from librosa library have twice the time duration of the actual audio track?
I am using the following code to obtain Mel spectrogram from a recorded audio signal of about 30 s: Obtained spectrogram: Mel spectrogram Can you please explain me why the time axis depicts twice the time duration (it should be 30 s). What is going wrong with the code? Answer You need to pass the sampling rat…