I am trying to create a program that will record audio for a machine learning project, and I want to use google colab so that people don’t have to install or run anything on their system, I found this example online that records and plays audio:
cell 1 contains the js code to record audio and the python code to turn it into a bytes object:
# all imports from io import BytesIO from base64 import b64decode from google.colab import output from IPython.display import Javascript RECORD = """ const sleep = time => new Promise(resolve => setTimeout(resolve, time)) const b2text = blob => new Promise(resolve => { const reader = new FileReader() reader.onloadend = e => resolve(e.srcElement.result) reader.readAsDataURL(blob) }) var record = time => new Promise(async resolve => { stream = await navigator.mediaDevices.getUserMedia({ audio: true }) recorder = new MediaRecorder(stream) chunks = [] recorder.ondataavailable = e => chunks.push(e.data) recorder.start() await sleep(time) recorder.onstop = async ()=>{ blob = new Blob(chunks) text = await b2text(blob) resolve(text) } recorder.stop() }) """ def record(sec=3): print("") print("Speak Now...") display(Javascript(RECORD)) sec += 1 s = output.eval_js('record(%d)' % (sec*1000)) print("Done Recording !") b = b64decode(s.split(',')[1]) return b #byte stream
cell 2 runs the recording functions:
audio = record(2)
cell 3 creates a display item so you can play the recording:
import IPython.display as ipd ipd.display(ipd.Audio(audio))
In the end I will be having users speak a word for 1 second, and the issue I am running into is that there is a discrepancy between when the user is told to speak and when the actual recording starts, if I speak right away the beginning of my speaking is not in the audio file. Is there a way to more precisely line up when the command to speak appears and when the actual recording starts?
Advertisement
Answer
I think the discrepancy is because of the time needed to set things up. In particular, the time to run the following codes before we could get to recorder.start()
stream = await navigator.mediaDevices.getUserMedia({ audio: true }) recorder = new MediaRecorder(stream) chunks = [] recorder.ondataavailable = e => chunks.push(e.data)
Also, when print("Speak Now...")
is executed, it should be quickly followed by recorder.start().
So I think we can reduce the delay by setting things up in advance and just: print(“Speak Now…”); recorder.start()