Python CSV returning individual characters, expecting strings

This program uses Python’s CSV module to process a stream containing a CR/LF delimited list of comma separated values (CSV). Instead of getting a list of strings, with each string representing the text that appears between the delimiters (the commas), I’m getting a list of characters. The program uses subprocess.run() to return a stream containing rows of data separated by commas and newlines (CSV). The returned stream is printed and this output appears as expected (i.e. formatted as CSV). The program:

import os
import subprocess
import csv

for file in os.listdir("/Temp/Video"):
    if file.endswith(".mkv"):
        print(os.path.join("/Temp/Video", file))
        ps = subprocess.run(["ffprobe", "-show_streams", "-print_format", "csv",  "-i", "/Temp/Video/" + file], capture_output = True, text = True)
        
        print("------------------------------------------")
        print(ps.stdout)
        print("------------------------------------------")

        reader = csv.reader(ps.stdout)

        for row in reader:
            print(row)

        exit(0)

The output from the print(ps.stdout) statement:

stream,0,h264,H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10,High,video,[0][0][0][0],0x0000,1920,1080,1920,1080,0,0,2,1:1,16:9,yuv420p,40,unknown,unknown,unknown,unknown,left,progressive,1,true,4,N/A,24000/1001,24000/1001,1/1000,0,0.000000,N/A,N/A,N/A,N/A,8,N/A,N/A,N/A,46,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,eng,17936509,01:20:18.271791666,115523,10802870592,001011,MakeMKV v1.16.4 win(x64-release),2021-08-20 19:09:26,BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES SOURCE_ID,Lavc59.7.102 libx264,00:01:30.010000000
stream,1,vorbis,Vorbis,unknown,audio,[0][0][0][0],0x0000,fltp,48000,3,3.0,0,0,N/A,0/0,0/0,1/1000,0,0.000000,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,3314,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,eng,Surround 3.0,2422660,01:20:18.272000000,451713,1459129736,001100,MakeMKV v1.16.4 win(x64-release),2021-08-20 19:09:26,BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES SOURCE_ID,Lavc59.7.102 libvorbis,00:01:30.003000000

And the some of the output from the for loop:

['s']
['t']
['r']
['e']
['a']
['m']
['', '']
['0']
['', '']
['h']
['2']
['6']
['4']
['', '']
['H']
['.']
['2']
['6']
['4']
[' ']
['/']
[' ']
['A']
['V']
['C']
[' ']
['/']
[' ']
['M']
['P']
['E']
['G']
['-']
['4']
[' ']

What I was expecting was this:

[stream,0,h264,H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10. ...]
[stream,1,vorbis,Vorbis,unknown,audio,[0][0][0][0] ...]

Why is row a list of characters and not a list of strings?

Answer

It’s returning bytes from stdout, not a file. When you loop over bytes, you get each instead of what you want. Instead, decode then split on newlines then loop over it.

lines = ps.stdout.decode().split('n')
for line in lines:
  cols = line.split(',')
  print(cols[0])  # prints "stream"

This could be passed to csv reader. For example:

reader = ps.stdout.decode().splitlines():
for row in reader:
  print(row)

You could also make a temp file from out subprocess stdout like so:

import csv
from io import StringIO

s = StringIO(ps.stdout.decode())
reader = csv.reader(s, skipinitialspace=True)
for row in reader:
  print(row)

Advertisement

Answer