I want to create a model that can predict who has speak with different word.
In this case i try to use feature
Mfcc Melspectogram Tempo Chroma stft Spectral Centroid Spectral Bandwidth Tempo
And for train that i am use RandomforestRegressor
It’s possible to create model like that?
Advertisement
Answer
For the sound processing and feature extraction part, librosa
is definitely going to provide you all you need.
For the machine learning part however, speaker identification (also called “voice recognition”) is a relatively complex task. You probably will get more success using techniques from deep learning. You can certainly try to use random forests if you like, but you’ll probably get a lower accuracy and will have to spend more time doing feature engineering. In fact, it will be a good exercise for you to compare the results you can get with the various techniques.
For an example tutorial on speaker identification using Keras, see e.g. this article.