Proccesing audio from twilio media stream using Python

I am streaming call audio to my local server using Twilio Streams. For reference I used the offical guide from the Twilio Team.

Decoding the audio and saving it to a .wav file works, although when playing back the audio sounds somewhat distored (“slow-motion” with compression artificats). You can listen to it on soundcloud here. Compared to the audio recording that twilio provides from the twilio console for the same call, there is a notable difference. Now I would want to get the audio from the stream to sound ideally just as good, as I need to feed it into a custom ML model.

When comparing the above audio files using this code:

import pywav
wave_read = pywav.WavRead("filename.wav")
print(wave_read.getparams())

We get:

Twilio Audio: {‘numchannels’: 1, ‘samplerate’: 8000, ‘byterate’: 16000, ‘bytespersample’: 2, ‘bitspersample’: 16, ‘samplelength’: 82998, ‘audioformat’: ‘PCM (without compression)’}

Stream Audio: {‘numchannels’: 1, ‘samplerate’: 8000, ‘byterate’: 16000, ‘bytespersample’: 2, ‘bitspersample’: 16, ‘samplelength’: 69120, ‘audioformat’: ‘PCMU (with mu-law compression)’}

I am fairly certain that the problem lies in how I save the decoded bytes from the stream into file.

  if data['event'] == "media":
        if not has_seen_media:
            recorded.append(base64.b64decode(data['media']['payload']))
            app.logger.info("Media message: {}".format(message))
            payload = data['media']['payload']
            app.logger.info("Payload is: {}".format(payload))
            chunk = base64.b64decode(payload)
            recorded.append(chunk)
            app.logger.info("That's {} bytes".format(len(chunk)))
            app.logger.info("Additional media messages from WebSocket are being suppressed....")
            has_seen_media = False
    if data['event'] == "closed":
        app.logger.info("Closed Message received: {}".format(message))
        break
    message_count += 1


app.logger.info("Connection closed. Received a total of {} messages".format(message_count))
data_bytes = b''.join(recorded)
wave_write = pywav.WavWrite("Recording.wav", 1,8000,8,7)  # 1 stands for mono channel, 8000 sample rate, 8 bit, 7 stands for MULAW encoding
wave_write.write(data_bytes)
wave_write.close()

Note: I have changed the bits from 8 to 16 in the WavWrite function with no difference in audio quality.

I’ve looked at implementing code snippets from this previous post on StackOverflow. Although without success.

How would you improve the quality of the saved audio file? (ideally in python).

Answer

I think I know what the issue is. In your loop, you add the decoded audio to the array of payloads twice:

recorded.append(base64.b64decode(data['media']['payload'])) # <<< Adding the payload for the first time
app.logger.info("Media message: {}".format(message))
payload = data['media']['payload']
app.logger.info("Payload is: {}".format(payload))
chunk = base64.b64decode(payload)
recorded.append(chunk) # <<< Adding the payload for the second time

If you listen to the audio, it is slow because it seems to be repeating each chunk. If you cut one of the lines above I think you will have correct sounding audio.

Advertisement

Answer