This program uses Python’s CSV module to process a stream containing a CR/LF delimited list of comma separated values (CSV). Instead of getting a list of strings, with each string representing the text that appears between the delimiters (the commas), I’m getting a list of characters. The program uses subprocess.run()
to return a stream containing rows of data separated by commas and newlines (CSV). The returned stream is printed and this output appears as expected (i.e. formatted as CSV). The program:
import os import subprocess import csv for file in os.listdir("/Temp/Video"): if file.endswith(".mkv"): print(os.path.join("/Temp/Video", file)) ps = subprocess.run(["ffprobe", "-show_streams", "-print_format", "csv", "-i", "/Temp/Video/" + file], capture_output = True, text = True) print("------------------------------------------") print(ps.stdout) print("------------------------------------------") reader = csv.reader(ps.stdout) for row in reader: print(row) exit(0)
The output from the print(ps.stdout)
statement:
stream,0,h264,H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10,High,video,[0][0][0][0],0x0000,1920,1080,1920,1080,0,0,2,1:1,16:9,yuv420p,40,unknown,unknown,unknown,unknown,left,progressive,1,true,4,N/A,24000/1001,24000/1001,1/1000,0,0.000000,N/A,N/A,N/A,N/A,8,N/A,N/A,N/A,46,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,eng,17936509,01:20:18.271791666,115523,10802870592,001011,MakeMKV v1.16.4 win(x64-release),2021-08-20 19:09:26,BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES SOURCE_ID,Lavc59.7.102 libx264,00:01:30.010000000 stream,1,vorbis,Vorbis,unknown,audio,[0][0][0][0],0x0000,fltp,48000,3,3.0,0,0,N/A,0/0,0/0,1/1000,0,0.000000,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,3314,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,eng,Surround 3.0,2422660,01:20:18.272000000,451713,1459129736,001100,MakeMKV v1.16.4 win(x64-release),2021-08-20 19:09:26,BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES SOURCE_ID,Lavc59.7.102 libvorbis,00:01:30.003000000
And the some of the output from the for
loop:
['s'] ['t'] ['r'] ['e'] ['a'] ['m'] ['', ''] ['0'] ['', ''] ['h'] ['2'] ['6'] ['4'] ['', ''] ['H'] ['.'] ['2'] ['6'] ['4'] [' '] ['/'] [' '] ['A'] ['V'] ['C'] [' '] ['/'] [' '] ['M'] ['P'] ['E'] ['G'] ['-'] ['4'] [' ']
What I was expecting was this:
[stream,0,h264,H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10. ...] [stream,1,vorbis,Vorbis,unknown,audio,[0][0][0][0] ...]
Why is row
a list of characters and not a list of strings?
Advertisement
Answer
It’s returning bytes from stdout, not a file. When you loop over bytes, you get each instead of what you want. Instead, decode then split on newlines then loop over it.
lines = ps.stdout.decode().split('n') for line in lines: cols = line.split(',') print(cols[0]) # prints "stream"
This could be passed to csv reader. For example:
reader = ps.stdout.decode().splitlines(): for row in reader: print(row)
You could also make a temp file from out subprocess stdout like so:
import csv from io import StringIO s = StringIO(ps.stdout.decode()) reader = csv.reader(s, skipinitialspace=True) for row in reader: print(row)