Google colab audio recording, how to implement a more precise way to tell users to start speaking into mic

I am trying to create a program that will record audio for a machine learning project, and I want to use google colab so that people don’t have to install or run anything on their system, I found this example online that records and plays audio:

cell 1 contains the js code to record audio and the python code to turn it into a bytes object:

# all imports
from io import BytesIO
from base64 import b64decode
from google.colab import output
from IPython.display import Javascript

RECORD = """
const sleep  = time => new Promise(resolve => setTimeout(resolve, time))
const b2text = blob => new Promise(resolve => {
  const reader = new FileReader()
  reader.onloadend = e => resolve(e.srcElement.result)
  reader.readAsDataURL(blob)
})
var record = time => new Promise(async resolve => {
  stream = await navigator.mediaDevices.getUserMedia({ audio: true })
  recorder = new MediaRecorder(stream)
  chunks = []
  recorder.ondataavailable = e => chunks.push(e.data)
  recorder.start()
  await sleep(time)
  recorder.onstop = async ()=>{
    blob = new Blob(chunks)
    text = await b2text(blob)
    resolve(text)
  }
  recorder.stop()
})
"""

def record(sec=3):
  print("")
  print("Speak Now...")
  display(Javascript(RECORD))
  sec += 1
  s = output.eval_js('record(%d)' % (sec*1000))
  print("Done Recording !")
  b = b64decode(s.split(',')[1])
  return b #byte stream

JavaScript
​x
 
# all imports
from io import BytesIO
from base64 import b64decode
from google.colab import output
from IPython.display import Javascript
​
RECORD = """
const sleep  = time => new Promise(resolve => setTimeout(resolve, time))
const b2text = blob => new Promise(resolve => {
  const reader = new FileReader()
  reader.onloadend = e => resolve(e.srcElement.result)
  reader.readAsDataURL(blob)
})
var record = time => new Promise(async resolve => {
  stream = await navigator.mediaDevices.getUserMedia({ audio: true })
  recorder = new MediaRecorder(stream)
  chunks = []
  recorder.ondataavailable = e => chunks.push(e.data)
  recorder.start()
  await sleep(time)
  recorder.onstop = async ()=>{
    blob = new Blob(chunks)
    text = await b2text(blob)
    resolve(text)
  }
  recorder.stop()
})
"""
​
def record(sec=3):
  print("")
  print("Speak Now...")
  display(Javascript(RECORD))
  sec += 1
  s = output.eval_js('record(%d)' % (sec*1000))
  print("Done Recording !")
  b = b64decode(s.split(',')[1])
  return b #byte stream
​

cell 2 runs the recording functions:

audio = record(2)

JavaScript
 
audio = record(2)
​

cell 3 creates a display item so you can play the recording:

import IPython.display as ipd

ipd.display(ipd.Audio(audio))

JavaScript
 
import IPython.display as ipd
​
ipd.display(ipd.Audio(audio))
​

In the end I will be having users speak a word for 1 second, and the issue I am running into is that there is a discrepancy between when the user is told to speak and when the actual recording starts, if I speak right away the beginning of my speaking is not in the audio file. Is there a way to more precisely line up when the command to speak appears and when the actual recording starts?

Answer

I think the discrepancy is because of the time needed to set things up. In particular, the time to run the following codes before we could get to recorder.start()

stream = await navigator.mediaDevices.getUserMedia({ audio: true })
recorder = new MediaRecorder(stream)
chunks = []
recorder.ondataavailable = e => chunks.push(e.data)

JavaScript
 
stream = await navigator.mediaDevices.getUserMedia({ audio: true })
recorder = new MediaRecorder(stream)
chunks = []
recorder.ondataavailable = e => chunks.push(e.data)
​

Also, when print("Speak Now...") is executed, it should be quickly followed by recorder.start().

So I think we can reduce the delay by setting things up in advance and just: print(“Speak Now…”); recorder.start()

Advertisement

Answer