I am able to read the audio but I am getting an error message while passing it to VAD(Voice Activity Detector). I think the error message is because the frames is in bytes, when feeding it to vad.is_speech(frame, sample_rate), should this frame be in bytes? Here is the code below:
frame_duration_ms=10 duration_in_ms = (frame_duration_ms / 1000) #duration in 10ms frame_size = int(sample_rate * duration_in_ms) #frame size of 160 frame_bytes = frame_size * 2 def frame_generator(buffer, frame_bytes): # repeatedly store 320 length array to the frame_stored when the frame_bytes is less than the size of the buffer while offset+frame_bytes < len(buffer): frame_stored = buffer[offset : offset+frame_bytes] offset = offset + frame_bytes return frame_stored num_padding_frames = int(padding_duration_ms / frame_duration_ms) # use deque for the sliding window ring_buffer = deque(maxlen=num_padding_frames) # we have two states TRIGGERED and NOTTRIGGERED state triggered = True #NOTTRIGGERED state frames = frame_generator(buffer, frame_bytes) speech_frame = [] for frame in frames: is_speech = vad.is_speech(frame, sample_rate)
Here is the error message:
TypeError Traceback (most recent call last) in 16 speech_frame = [] 17 for frame in frames: —> 18 is_speech = vad.is_speech(frame, sample_rate) 19 #print(frames)
C:Program FilesPython38libsite-packageswebrtcvad.py in is_speech(self, buf, sample_rate, length) 20 21 def is_speech(self, buf, sample_rate, length=None): —> 22 length = length or int(len(buf) / 2) 23 if length * 2 > len(buf): 24 raise IndexError(
TypeError: object of type ‘int’ has no len()
Advertisement
Answer
I have solved it, you know vad.is_speech(buf=frame, sample_rate)
, it takes the buf and calculates it length, but an integer value does not posses the len()
attributes in python.
This throws an error for example:
num = 1 print(len(num))
Use this instead:
data = [1,2,3,4] print(len(data))
So here is the correction to the code below:
frame_duration_ms=10 duration_in_ms = (frame_duration_ms / 1000) #duration in 10ms frame_size = int(sample_rate * duration_in_ms) #frame size of 160 frame_bytes = frame_size * 2 values = [] def frame_generator(buffer, frame_bytes): # repeatedly store 320 length array to the frame_stored when the frame_bytes is less than the size of the buffer while offset+frame_bytes < len(buffer): frame_stored = buffer[offset : offset+frame_bytes] offset = offset + frame_bytes values.append(frame_stored) return values num_padding_frames = int(padding_duration_ms / frame_duration_ms) # use deque for the sliding window ring_buffer = deque(maxlen=num_padding_frames) # we have two states TRIGGERED and NOTTRIGGERED state triggered = True #NOTTRIGGERED state frames = frame_generator(buffer, frame_bytes) frame = [] for frame in frames: is_speech = vad.is_speech(frame, sample_rate)