Skip to content
Advertisement

Why is junk data appearing in my Python’s subprocess stdout?

I’m writing a Python app that runs a command on an AWS remote docker container, and saves the output to a file. The command that is being run remotely is generating binary data (a database dump).

The app works great if I start the download and don’t touch anything. The issue I’m having is that if I start the download, and hit Enter while it’s downloading, or scroll my mouse wheel in the terminal window, my output file gets a ^M, or weird characters.

Sample Code:

#!/usr/bin/env python3

import npyscreen
import curses
import subprocess

MY_REGION=...
MY_CLUSTER=...
MY_TASK=...
MY_CONTAINER=...

class ProgressForm(npyscreen.Popup):
    def create(self):
        self.progress = self.add(
            npyscreen.TitleSliderPercent, step=1, out_of=100, name="Progress"
        )

    def activate(self):
        cmd = subprocess.Popen(
            [
                "aws",
                "--region",
                MY_REGION,
                "ecs",
                "execute-command",
                "--cluster",
                MY_CLUSTER,
                "--task",
                MY_TASK,
                "--container",
                MY_CONTAINER,
                "--command",
                "python -c 'for i in range(500_000): print(i)'",
                "--interactive",
            ],
            stdout=subprocess.PIPE,
            stderr=subprocess.STDOUT,
            bufsize=0,
        )

        total_size = 3889129
        downloaded = 0
        with open("out.log", "wb") as f:
            while True:
                chunk = cmd.stdout.read(1024)

                if not chunk:
                    break

                f.write(chunk)

                downloaded += len(chunk)

                self.progress.set_value(min(downloaded/total_size*100, 100))
                self.progress.display()

        self.parentApp.switchForm(None)

class MAIN(npyscreen.FormBaseNew):
    def create(self):
        self.items = self.add(
            npyscreen.GridColTitles,
            col_titles=["Column"],
            select_whole_line=True,
        )
        self.items.add_handlers({curses.ascii.NL: self.item_chosen})

    def activate(self):
        for i in range(4):
            self.items.values = [
                ["Row Data"]
            ]

        self.edit()

    def item_chosen(self, inpt):
        self.parentApp.switchForm("progressForm")

class App(npyscreen.NPSAppManaged):
    def onStart(self):
        self.addForm("MAIN", MAIN, name="My App")
        self.addForm("progressForm", ProgressForm)

if __name__ == "__main__":
    app = App().run()

Hitting Enter during the download, or scrolling the mouse wheel results in this:

...
10667

10668
10669
...

and this:

...
17451
17452
17453
^[[<65;121;31M17454
17455
17456
17457
...

Why is my subprocess’ stdout being littered with junk data?

Edit: The full output can be found here

Advertisement

Answer

When you don’t specify what subprocess should do with stdin, it gets inherited from the parent process, letting the child see your enter keys, scroll-wheel data, etc.

A typical noninteractive process won’t do “local echo” of input back to output; but you’re using --interactive here, so the behavior is not surprising.

Set stdin=subprocess.DEVNULL to explicitly route stdin from nowhere (stdin connected to /dev/null shows up as an immediate EOF on the first attempted read; most programs that aren’t written to require input will handle this correctly).

If the program requires there to be a stdin stream that isn’t immediately closed, you might instead use stdin=subprocess.PIPE, and then leave cmd.stdin alone until it’s time for the remote program to exit (at which point a cmd.stdin.close(), while not strictly mandatory, would not be remiss).

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement