This program uses Python’s CSV module to process a stream containing a CR/LF delimited list of comma separated values (CSV).  Instead of getting a list of strings, with each string representing the text that appears between the delimiters (the commas), I’m getting a list of characters.  The program uses subprocess.run() to return a stream containing rows of data separated by commas and newlines (CSV).  The returned stream is printed and this output appears as expected (i.e. formatted as CSV).  The program:
import os
import subprocess
import csv
for file in os.listdir("/Temp/Video"):
    if file.endswith(".mkv"):
        print(os.path.join("/Temp/Video", file))
        ps = subprocess.run(["ffprobe", "-show_streams", "-print_format", "csv",  "-i", "/Temp/Video/" + file], capture_output = True, text = True)
        
        print("------------------------------------------")
        print(ps.stdout)
        print("------------------------------------------")
        reader = csv.reader(ps.stdout)
        for row in reader:
            print(row)
        exit(0)
The output from the print(ps.stdout) statement:
stream,0,h264,H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10,High,video,[0][0][0][0],0x0000,1920,1080,1920,1080,0,0,2,1:1,16:9,yuv420p,40,unknown,unknown,unknown,unknown,left,progressive,1,true,4,N/A,24000/1001,24000/1001,1/1000,0,0.000000,N/A,N/A,N/A,N/A,8,N/A,N/A,N/A,46,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,eng,17936509,01:20:18.271791666,115523,10802870592,001011,MakeMKV v1.16.4 win(x64-release),2021-08-20 19:09:26,BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES SOURCE_ID,Lavc59.7.102 libx264,00:01:30.010000000 stream,1,vorbis,Vorbis,unknown,audio,[0][0][0][0],0x0000,fltp,48000,3,3.0,0,0,N/A,0/0,0/0,1/1000,0,0.000000,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,3314,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,eng,Surround 3.0,2422660,01:20:18.272000000,451713,1459129736,001100,MakeMKV v1.16.4 win(x64-release),2021-08-20 19:09:26,BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES SOURCE_ID,Lavc59.7.102 libvorbis,00:01:30.003000000
And the some of the output from the for loop:
['s'] ['t'] ['r'] ['e'] ['a'] ['m'] ['', ''] ['0'] ['', ''] ['h'] ['2'] ['6'] ['4'] ['', ''] ['H'] ['.'] ['2'] ['6'] ['4'] [' '] ['/'] [' '] ['A'] ['V'] ['C'] [' '] ['/'] [' '] ['M'] ['P'] ['E'] ['G'] ['-'] ['4'] [' ']
What I was expecting was this:
[stream,0,h264,H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10. ...] [stream,1,vorbis,Vorbis,unknown,audio,[0][0][0][0] ...]
Why is row a list of characters and not a list of strings?
Advertisement
Answer
It’s returning bytes from stdout, not a file. When you loop over bytes, you get each instead of what you want. Instead, decode then split on newlines then loop over it.
lines = ps.stdout.decode().split('n')
for line in lines:
  cols = line.split(',')
  print(cols[0])  # prints "stream"
This could be passed to csv reader. For example:
reader = ps.stdout.decode().splitlines(): for row in reader: print(row)
You could also make a temp file from out subprocess stdout like so:
import csv from io import StringIO s = StringIO(ps.stdout.decode()) reader = csv.reader(s, skipinitialspace=True) for row in reader: print(row)