Sunday, July 1, 2007

PyMOTW: subprocess

Module: subprocess
Purpose: Spawn and communicate with additional processes.
Python Version: New in 2.4

An updated version of this article can be found on the main PyMOTW site.

Description:

The subprocess module provides a consistent interface to creating and working with additional processes. It offers a higher-level interface than some of the other available modules, and is intended to replace functions such as os.system, os.spawn*, os.popen*, popen2.* and commands.*. To make it easier to compare subprocess with those other modules, this week I will re-create earlier examples using the functions being replaced.

The subprocess module defines one class, Popen() and a few wrapper functions which use that class. Popen() takes several arguments to make it easier to set up the new process, and then communicate with it via pipes. I will concentrate on example code here; for a complete description of the arguments, refer to section 17.1.1 of the library documentation.

A Note About Portability

The API is roughly the same, but the underlying implementation is slightly different between Unix and Windows. All of the examples shown here were tested on Mac OS X. Your mileage on a non-Unix OS will vary.

Running External Command

To run an external command without interacting with it, such as one would do with os.system(), Use the call() function.

import subprocess

# Simple command
subprocess.call('ls -l', shell=True)


$ python replace_os_system.py
total 16
-rw-r--r-- 1 dhellman dhellman 0 Jul 1 11:29 __init__.py
-rw-r--r-- 1 dhellman dhellman 1316 Jul 1 11:32 replace_os_system.py
-rw-r--r-- 1 dhellman dhellman 1167 Jul 1 11:31 replace_os_system.py~


And since we set shell=True, shell variables in the command string are expanded:

# Command with shell expansion
subprocess.call('ls -l $HOME', shell=True)


total 40
drwx------ 10 dhellman dhellman 340 Jun 30 18:45 Desktop
drwxr-xr-x 15 dhellman dhellman 510 Jun 19 07:08 Devel
drwx------ 29 dhellman dhellman 986 Jun 29 07:44 Documents
drwxr-xr-x 44 dhellman dhellman 1496 Jun 29 09:51 DownloadedApps
drwx------ 55 dhellman dhellman 1870 May 22 14:53 Library
drwx------ 8 dhellman dhellman 272 Mar 4 2006 Movies
drwx------ 11 dhellman dhellman 374 Jun 21 07:04 Music
drwx------ 12 dhellman dhellman 408 Jul 1 01:00 Pictures
drwxr-xr-x 5 dhellman dhellman 170 Oct 1 2006 Public
drwxr-xr-x 15 dhellman dhellman 510 May 12 15:19 Sites
drwxr-xr-x 5 dhellman dhellman 170 Oct 5 2005 cfx
drwxr-xr-x 4 dhellman dhellman 136 Jan 23 2006 iPod
-rw-r--r-- 1 dhellman dhellman 204 Jun 18 17:07 pgadmin.log
drwxr-xr-x 3 dhellman dhellman 102 Apr 29 16:32 tmp


Reading Output of Another Command

By passing different arguments for stdin, stdout, and stderr it is possible to mimic the variations of os.popen().

Reading from the output of a pipe:

print '\nread:'
proc = subprocess.Popen('echo "to stdout"',
shell=True,
stdout=subprocess.PIPE,
)
stdout_value = proc.communicate()[0]
print '\tstdout:', repr(stdout_value)


Writing to the input of a pipe:

print '\nwrite:'
proc = subprocess.Popen('cat -',
shell=True,
stdin=subprocess.PIPE,
)
proc.communicate('\tstdin: to stdin\n')


Reading and writing, as with popen2:

print '\npopen2:'

proc = subprocess.Popen('cat -',
shell=True,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
)
stdout_value = proc.communicate('through stdin to stdout')[0]
print '\tpass through:', repr(stdout_value)


Separate streams for stdout and stderr, as with popen3:

print '\npopen3:'
proc = subprocess.Popen('cat -; echo ";to stderr" 1>&2',
shell=True,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
)
stdout_value, stderr_value = proc.communicate('through stdin to stdout')
print '\tpass through:', repr(stdout_value)
print '\tstderr:', repr(stderr_value)


Merged stdout and stderr, as with popen4:

print '\npopen4:'
proc = subprocess.Popen('cat -; echo ";to stderr" 1>&2',
shell=True,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
)
stdout_value, stderr_value = proc.communicate('through stdin to stdout\n')
print '\tcombined output:', repr(stdout_value)


Sample output:

read:
stdout: 'to stdout\n'

write:
stdin: to stdin

popen2:
pass through: 'through stdin to stdout'

popen3:
pass through: 'through stdin to stdout'
stderr: ';to stderr\n'

popen4:
combined output: 'through stdin to stdout\n;to stderr\n'


All of the above examples assume a limited amount of interaction. The communicate() method reads all of the output and waits for child process to exit before returning. It is also possible to write to and read from the individual pipe handles used by the Popen instance. To illustrate this, I will use this simple echo program which reads its standard input and writes it back to standard output:

import sys

sys.stderr.write('repeater.py: starting\n')

while True:
next_line = sys.stdin.readline()
if not next_line:
break
sys.stdout.write(next_line)
sys.stdout.flush()

sys.stderr.write('repeater.py: exiting\n')


Make note of the fact that repeater.py writes to stderr when it starts and stops. We can use that to show the lifetime of the subprocess in the next example. The following interaction example uses the stdin and stdout file handles owned by the Popen instance in different ways. In the first example, a sequence of 10 numbers are written to stdin of the process, and after each write the next line of output is read back. In the second example, the same 10 numbers are written but the output is read all at once using communicate().

import subprocess

print 'One line at a time:'
proc = subprocess.Popen('repeater.py',
shell=True,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
)
for i in range(10):
proc.stdin.write('%d\n' % i)
output = proc.stdout.readline()
print output.rstrip()
proc.communicate()

print
print 'All output at once:'
proc = subprocess.Popen('repeater.py',
shell=True,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
)
for i in range(10):
proc.stdin.write('%d\n' % i)

output = proc.communicate()[0]
print output


Notice where the "repeater.py: exiting" lines fall in the output for each loop:

$ python interaction.py
One line at a time:
repeater.py: starting
0
1
2
3
4
5
6
7
8
9
repeater.py: exiting

All output at once:
repeater.py: starting
repeater.py: exiting
0
1
2
3
4
5
6
7
8
9


Signaling Between Processes

In part 4 of the series on the os module I included an example of signaling between processes using os.fork() and os.kill(). Since each Popen instance provides a pid attribute with the process id of the child process, it is possible to do something similar with subprocess. For this example, I will again set up a separate script for the child process to be executed by the parent process.

import os
import signal
import time

def signal_usr1(signum, frame):
"Callback invoked when a signal is received"
pid = os.getpid()
print 'Received USR1 in process %s' % pid

print 'CHILD: Setting up signal handler'
signal.signal(signal.SIGUSR1, signal_usr1)
print 'CHILD: Pausing to wait for signal'
time.sleep(5)


And now the parent process:

import os
import signal
import subprocess
import time

proc = subprocess.Popen('signal_child.py')
print 'PARENT: Pausing before sending signal...'
time.sleep(1)
print 'PARENT: Signaling %s' % proc.pid
os.kill(proc.pid, signal.SIGUSR1)


And the output should look something like this:

$ python signal_parent.py
CHILD: Setting up signal handler
CHILD: Pausing to wait for signal
PARENT: Pausing before sending signal...
PARENT: Signaling 4124
Received USR1 in process 4124


Conclusions

As you can see, subprocess can be much easier to work with than fork, exec, and pipes on their own. It provides all of the functionality of the other modules and functions it replaces, and more. The API is consistent for all uses and many of the extra steps of overhead needed (such as closing extra file descriptors, ensuring the pipes are closed, etc.) are "built in" instead of being handled by your code separately.

References:

Python Module of the Week
Sample code
PyMOTW: os (Part 2)
PyMOTW: os (Part 4)

Updated 9/5/2007 with minor formatting changes.
Updated 3/16/2009 with reference to maintained version of this article.


Technorati Tags:
,


38 comments:

Greg said...

Excellent job, thanks. This will be useful to lots of folks. Could I use this to pipe in whole programs into the Python interpreter? I.e., launch another instance of Python and send arbitrary commands to it?

Doug said...

Hi, Greg,

You could use the '-' option to the interpreter to send python statements into the second process.

Something like:

proc = subprocess.Popen('python -', shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE)
proc.stdin.write('print "in second process"\n')
print proc.stdout.readline()

That opens a big security hole, though, so you have to be careful about how you build the instructions you send to the second process.

Doug

Greg said...

I've never heard of the '-' option (and I can't think of a way to search for it :-)

Where can I learn more?

Doug said...

Using '-' to indicate that data should be read from standard input is a fairly common Unix-y convention. I'm not sure if it is actually a standard or not.

I found that the python interpreter supports it by running "python -h".

Gregory said...

Hmm, that doesn't work how I was expecting. I was hoping the process would stay open/connected and I could send code to it indefinately, but it seems to close after the first command.

BTW your example seems to hang on:
print proc.stdout.readline()

I only got it to work using code like this:
>>> proc = subprocess.Popen('python -',shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE)
>>> proc.communicate('print "hi"')
('hi\n', None)

Doug said...

That's what I get for posting code I didn't actually test.

It looks like the interpreter reads all of its standard input until reaching EOF, then starts processing the text. I didn't find any way to start an interpreter and issue commands to the same process repeatedly, without having to write some code on your own. You could write a small python script that used eval() in a loop, for example.

Matt Doar said...

Excellent summary of a complex subject. How can we get this into the core docs?

Doug said...

Thanks, Matt,

I used the examples in the core docs to figure out how to construct the sample code here. As a result, this text and the examples it includes might be considered redundant. I would be happy to contribute all of it, though, if someone wanted to advocate for its inclusion in the standard documentation. I'm not sure what the process for contributing documentation is.

Gregory said...

I was reading about os.kill here. But this documentation: http://docs.python.org/lib/module-signal.html
implies that:

Although Python signal handlers are called asynchronously as far as the Python user is concerned, they can only occur between the ``atomic'' instructions of the Python interpreter. This means that signals arriving during long calculations implemented purely in C (such as regular expression matches on large bodies of text) may be delayed for an arbitrary amount of time.

I'm curious what options do I have in Unix (Linux) to truly kill a Python instance immediately no matter what it's doing?

Doug said...

@Gregory,

What signal are you using?

I think that documentation you quote is talking about having your Python program receive and handle signals, using a callback. If you have a signal handler registered, and your program receives a SIGUSR1, you have to wait for the interpreter to finish the bytecode or extension call it's working on before it will transfer control to your signals handler.

If you just want to kill the program, I'm pretty sure "kill -9" (SIGKILL) cannot be trapped (please correct me if I'm wrong), and the process should be interrupted immediately.

Doug

Gregory said...

Thanks, Doug. I'll give it a try. What's a long running byte code operation I could use to test it with?

Large exponentiation perhaps?

Doug said...

A factorial maybe, but exponentiation should be O(1).

It might be easier to simulate using an extension module. Something simple that runs in a tight loop and not doing I/O of any sort.

Gregory said...

Exponentiation can't be O(1) since:

2**99999999999999

Takes a long time to run while 2**99 doesn't. Perhaps it's O(n something) based on the size of the exponent?

If 2**99999999999999 is an atomic instruction, that's a good candidate for testing.

What do you think?

Doug said...

You're right; I don't know what I was thinking. It certainly doesn't look like SIGINT interrupts the interperter while it is calculating 2**99999999999. :-)

Gregory said...

os.kill(proc.pid,9) seems to kill even exponentation. I guess that's what I'm after.

I didn't try it on an extension module though. I've never made on before.

Anonymous said...

Doug,
Your example was far better at explaining subprocess than the standard documentation. It would be great if you could convince them to add it to the documentations as an example. It helped me, since I've had no experience with any of the process management routines that subprocess is supposed to improve upon.

Thanks,
J

Doug Hellmann said...

Hi, J,

Thanks for the kind words! A couple of the GHOP contest tasks included reworking some of of these posts and including the examples with the docs for 2.6. I believe that those have been checked in, but I don't know if it includes subprocess.

Anonymous said...

Thanks,

this was very helpful for a newbie; the Python official documentation lead me almost nowhere, but this solved my simple problem. Thanks.

Anders

Rahul said...

Thanks for your excellent document here..
I do have a question though...
I created following type of script based on your examples..

from subprocess import *
proc = Popen('python',shell=True,stdin=PIPE,stdout=PIPE)

for i in range(10):
proc.stdin.write('print 10\n')
o = proc.stdout.readline()
print o

Can you tell me, why doesn't this work...

Thanks,
-Rahul.

Doug Hellmann said...

@rahul - I need more information to give you any useful advice. What OS? What version of Python? What does "doesn't work" mean in this case?

Rahul said...

Hi Doug,
I should have been more clear...
I'm using python 2.4 on a linux system..
If you run the code I posted, it gets stuck in proc.stdout.readline()...
But I was able to get an elegant solution using
Pexpect : http://www.noah.org/wiki/Pexpect

Thanks,
-Rahul

Perfect Domain said...

Very good post.
I am learning python with google app.
I made a small application (pagerank check) based on your examples. But it gives an error

Traceback (most recent call last):
File "C:\Program Files\Google\google_appengine\google\appengine\ext\webapp\__init__.py", line 486, in __call__
handler.post(*groups)
File "C:\Program Files\Google\google_appengine\pagerank\pagerank.py", line 29, in post
stdout=subprocess.PIPE,
File "C:\Python25\lib\subprocess.py", line 618, in __init__
self.stdout = os.fdopen(c2pread, 'rb', bufsize)
AttributeError: 'module' object has no attribute 'fdopen'



Here is the main script

url=self.request.get('domain')
proc = subprocess.Popen('pr.py $url',
shell=True,
stdout=subprocess.PIPE,
)
stdout_value = proc.communicate()[0]
self.response.out.write(stdout_value)

I am working with python 2.5

Doug Hellmann said...

@Perfect Domain - I'm pretty sure AppEngine won't let you create new processes, so you won't be able to use subprocess.

Bryan Klein said...

Thank you, I gave you a bit of credit in a little script I wrote for some of our users.

http://code.google.com/p/fds-smv/wiki/fds2asciiBatchProcess

Thank you,
Bryan

Doug Hellmann said...

@Bryan - Thanks, that's really nice! fds-smv looks very very cool. I wish I understood the physics behind it.

Rosie said...
This comment has been removed by a blog administrator.
Dan O'H said...

Here's a gotcha I just ran into. If you use subprocess.Popen with shell=True, the pid will be the pid of the shell, not the pid of the child process.

This messes up sending signals to your subprocess. If you use something like 'os.kill(proc.pid, signal.SIGUSR1)' you'll be sending the signal to the shell, not to the child process. The signal won't necessarily be forwarded to the child process.

Tad said...

I would like to execute a sequence of commands separated by semicolumns like in:

pipe(["cd mydir; ls", "-la"], ...)

but I get an exception:

OSError: [Errno 2] No such file or directory

I could define a script for the first argument, but 'mydir' is recomputed for every call so that it would not be very practical.

I am using Mac OS X and Linux.

-- T. S. Ferreira

Tad said...

In my previous posting read:

Popen(...)

instead of:

pipe(...)

:-(

Doug Hellmann said...

@Tad - Try constructing a string with the entire command line and passing shell=True.

Doug Hellmann said...

@Dan O'H - That's a good point. If you think about the process tree it makes sense, but might not be obvious to someone at first.

Tim Valenta said...

Thanks for being so comprehensive. Turns out I was battling a problem that didn't exist (program output was blank!), but this is definitely more helpful than any other documentation I could find on the module.

Tim

Yan C said...

I have a question.
if the child process using additional fd to communicate with the parent process(eg. fd 4 for input,fd 5 for output,fd 6 for err output),what should I do?

Doug Hellmann said...

@Yan - As far as I can tell, subprocess manages only stdin, stdout, and stderr for you. If you're using other file descriptors, you'll end up managing those yourself.

jp said...

I am rather new to all this..

I am trying to send some commands to the a subprocess being opened which documentation of the app says can be used at the dos command line (command-line flags??). Namely this... "-yes". Is there another setting that I am missing? Any insight? Thanks.

proc = subprocess.Popen(["c:/program files/trimble/gps pathfinder office/pfocorrectapp.exe", "c:/workspace/pathf/JM093012A.ssf"],-yes, shell=1, stdout=subprocess.PIPE)

Doug Hellmann said...

@jp - If you want to pass arguments to the command, include them as strings inside the list of values that make up the command line (the first argument to the function, enclosed in square brackets). You're already doing that with the command name, and what looks like a data file argument. The "-yes" switch should go in there, too.

jp said...

Thanks Doug.

I got that now. I am not sure if this exe that I start even has swithces coded in. Other related exe's within this same app do use the "-yes" switch so I have been trying it and different ones.

Maybe you have other insight here..
The exe opens up a wizard which I have been able to open in a subprocess and get the data file into the wizard's dialog box..but then I need to manually press the "Next" button in the wizard. How would I simulate this? Also..when I move the "-yes" from after the data file to just before it Python get into a longer processing mode or something ( what i am saying is that if I put the switch after the data file the wizard pops up and then python stops..if I put it before then python stays in some type of process..maybe starting and stopping). What would this indicate? Thanks for any insight.

Doug Hellmann said...

@jp - It sounds like maybe the application you're working with isn't expecting to be run from the console. If it requires the user to interact with a wizard, I don't know what you can do short of rewriting it. Perhaps there's another switch to disable the wizard mode?