Skip to content
Advertisement

FFMPEG concat leaves audio gapes between clips

I’m writing a python script that uses subprocess to invoke FFMPEG, not using pyffmpeg.

My script generates a variable number of MP4 files using the AAC audio codec, and concatenates them together using FFMPEG. Here is how I’m constructing each clip:

ffmpeg -loop 1 -i image.jpg -i recording.mp3 -tune stillimage -c:a aac -b:a 256k -shortest clip.mp4

The command I’m using to concatenate them is:

ffmpeg -f concat -i clip_names.txt -c copy video_raw.mp4

I then take that resulting video, and mix a looping audio track over it, and adjust the volume. (Sorry for the awful formatting)

ffmpeg -i video_raw -filter_complex
                 "amovie=Tracks/Breaktime.mp3:loop=0,
                  volume=0.1,
                  asetpts=N/SR/TB[aud];
                  [0:a][aud]amix[a]"
-map 0:v -map [a] -b:a 256k -shortest final_video.mp4

These commands seem to work as I intend them to. When I play the resulting MP4 from my local machine, everything plays without issue.

However, I uploaded the video to YouTube, and ran into issues. When the video is played from YouTube, there is about a second of silence at every timestamp where two clips were concatenated, before the next clip begins. I’ve tried this from Chrome, IE, and Firefox, all with the same issues.

Based on what I’ve looked into so far, I think it could be an issue with how the priming samples of each individual clip are handled. I’m not obligated to keep using MP4 or AAC, so if using a different audio/video codec would work better, feel free to suggest!

Is there some type of manipulation I can do in FFMPEG to get rid of the priming samples, or somehow process them differently? In the end, I’m looking for each clip to play back to back without the delay that the concat operation seems to insert. Thank you!

Advertisement

Answer

It’s not due to priming samples. -shortest does not ensure the same length of all streams since there may be buffered packets in the muxing queue when the muxer receives signal to stop output. For a 25 fps video stream, 1 second of overflow seems about right. There are ways to mitigate the duration overflow, but I would recommend that you fetch the duration of the audio and set -t X as per that and skip -shortest.

Also, save to MOV with audio codec -c:a pcm_s16le. You’ll avoid the priming sample offsets.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement