The signal is 1 second long with sampling rate of 16000, I compute 13 MFCC with 400 hop length. The output dimensions are (13,41). Why do I get 41 frames, isn’t it supposed to be (time*sr/hop_length)=40? Answer TL;DR answer Yes, it is correct. Long answer You are using a time-series as input (signal), which means that librosa first computes a
Tag: audio-processing
Extracting F0, jitter and shimmer from an audio file using Python
Recently I got the task: to extract such features as F0(fundamental frequency), Jitter and Shimmer from a given chain of short audio files (around 5-10 sec, a voice singing on one note). And, unfortunately, I am good for nothing in Audio Signal Processing. Any Python libs to help me do it easy and fast? Thank you in advance! Answer You