PEP: 324
Title: subprocess - New process module
Version: $Revision$
Last-Modified: $Date$
Author: Peter Astrand <[email protected]>
Status: Final
Type: Standards Track
Content-Type: text/x-rst
Created: 19-Nov-2003
Python-Version: 2.4
Post-History:


Abstract
========

This PEP describes a new module for starting and communicating
with processes.


Motivation
==========

Starting new processes is a common task in any programming
language, and very common in a high-level language like Python.
Good support for this task is needed, because:

- Inappropriate functions for starting processes could mean a
 security risk: If the program is started through the shell, and
 the arguments contain shell meta characters, the result can be
 disastrous. [1]_

- It makes Python an even better replacement language for
 over-complicated shell scripts.

Currently, Python has a large number of different functions for
process creation.  This makes it hard for developers to choose.

The subprocess module provides the following enhancements over
previous functions:

- One "unified" module provides all functionality from previous
 functions.

- Cross-process exceptions: Exceptions happening in the child
 before the new process has started to execute are re-raised in
 the parent.  This means that it's easy to handle ``exec()``
 failures, for example.  With popen2, for example, it's
 impossible to detect if the execution failed.

- A hook for executing custom code between fork and exec.  This
 can be used for, for example, changing uid.

- No implicit call of /bin/sh.  This means that there is no need
 for escaping dangerous shell meta characters.

- All combinations of file descriptor redirection is possible.
 For example, the "python-dialog" [2]_ needs to spawn a process
 and redirect stderr, but not stdout.  This is not possible with
 current functions, without using temporary files.

- With the subprocess module, it's possible to control if all open
 file descriptors should be closed before the new program is
 executed.

- Support for connecting several subprocesses (shell "pipe").

- Universal newline support.

- A ``communicate()`` method, which makes it easy to send stdin data
 and read stdout and stderr data, without risking deadlocks.
 Most people are aware of the flow control issues involved with
 child process communication, but not all have the patience or
 skills to write a fully correct and deadlock-free select loop.
 This means that many Python applications contain race
 conditions.  A ``communicate()`` method in the standard library
 solves this problem.


Rationale
=========

The following points summarizes the design:

- subprocess was based on popen2, which is tried-and-tested.

- The factory functions in popen2 have been removed, because I
 consider the class constructor equally easy to work with.

- popen2 contains several factory functions and classes for
 different combinations of redirection.  subprocess, however,
 contains one single class.  Since the subprocess module supports
 12 different combinations of redirection, providing a class or
 function for each of them would be cumbersome and not very
 intuitive.  Even with popen2, this is a readability problem.
 For example, many people cannot tell the difference between
 popen2.popen2 and popen2.popen4 without using the documentation.

- One small utility function is provided: ``subprocess.call()``. It
 aims to be an enhancement over ``os.system()``, while still very
 easy to use:

 - It does not use the Standard C function system(), which has
   limitations.

 - It does not call the shell implicitly.

 - No need for quoting; using an argument list.

 - The return value is easier to work with.

 The ``call()`` utility function accepts an 'args' argument, just
 like the ``Popen`` class constructor.  It waits for the command to
 complete, then returns the ``returncode`` attribute.  The
 implementation is very simple::

     def call(*args, **kwargs):
         return Popen(*args, **kwargs).wait()

 The motivation behind the ``call()`` function is simple: Starting a
 process and wait for it to finish is a common task.

 While ``Popen`` supports a wide range of options, many users have
 simple needs.  Many people are using ``os.system()`` today, mainly
 because it provides a simple interface.  Consider this example::

     os.system("stty sane -F " + device)

 With ``subprocess.call()``, this would look like::

     subprocess.call(["stty", "sane", "-F", device])

 or, if executing through the shell::

     subprocess.call("stty sane -F " + device, shell=True)

- The "preexec" functionality makes it possible to run arbitrary
 code between fork and exec.  One might ask why there are special
 arguments for setting the environment and current directory, but
 not for, for example, setting the uid.  The answer is:

 - Changing environment and working directory is considered
   fairly common.

 - Old functions like ``spawn()`` has support for an
   "env"-argument.

 - env and cwd are considered quite cross-platform: They make
   sense even on Windows.

- On POSIX platforms, no extension module is required: the module
 uses ``os.fork()``, ``os.execvp()`` etc.

- On Windows platforms, the module requires either Mark Hammond's
 Windows extensions [5]_, or a small extension module called
 _subprocess.


Specification
=============

This module defines one class called Popen::

   class Popen(args, bufsize=0, executable=None,
               stdin=None, stdout=None, stderr=None,
               preexec_fn=None, close_fds=False, shell=False,
               cwd=None, env=None, universal_newlines=False,
               startupinfo=None, creationflags=0):


Arguments are:

- ``args`` should be a string, or a sequence of program arguments.
 The program to execute is normally the first item in the args
 sequence or string, but can be explicitly set by using the
 executable argument.

 On UNIX, with ``shell=False`` (default): In this case, the ``Popen``
 class uses ``os.execvp()`` to execute the child program.  ``args``
 should normally be a sequence.  A string will be treated as a
 sequence with the string as the only item (the program to
 execute).

 On UNIX, with ``shell=True``: If ``args`` is a string, it specifies the
 command string to execute through the shell.  If ``args`` is a
 sequence, the first item specifies the command string, and any
 additional items will be treated as additional shell arguments.

 On Windows: the ``Popen`` class uses ``CreateProcess()`` to execute the
 child program, which operates on strings.  If ``args`` is a
 sequence, it will be converted to a string using the
 ``list2cmdline`` method.  Please note that not all MS Windows
 applications interpret the command line the same way: The
 ``list2cmdline`` is designed for applications using the same rules
 as the MS C runtime.

- ``bufsize``, if given, has the same meaning as the corresponding
 argument to the built-in ``open()`` function: 0 means unbuffered, 1
 means line buffered, any other positive value means use a buffer
 of (approximately) that size.  A negative ``bufsize`` means to use
 the system default, which usually means fully buffered.  The
 default value for ``bufsize`` is 0 (unbuffered).

- ``stdin``, ``stdout`` and ``stderr`` specify the executed programs' standard
 input, standard output and standard error file handles,
 respectively.  Valid values are ``PIPE``, an existing file
 descriptor (a positive integer), an existing file object, and
 ``None``.  ``PIPE`` indicates that a new pipe to the child should be
 created.  With ``None``, no redirection will occur; the child's file
 handles will be inherited from the parent.  Additionally, ``stderr``
 can be STDOUT, which indicates that the stderr data from the
 applications should be captured into the same file handle as for
 stdout.

- If ``preexec_fn`` is set to a callable object, this object will be
 called in the child process just before the child is executed.

- If ``close_fds`` is true, all file descriptors except 0, 1 and 2
 will be closed before the child process is executed.

- If ``shell`` is true, the specified command will be executed through
 the shell.

- If ``cwd`` is not ``None``, the current directory will be changed to cwd
 before the child is executed.

- If ``env`` is not ``None``, it defines the environment variables for the
 new process.

- If ``universal_newlines`` is true, the file objects stdout and
 stderr are opened as a text file, but lines may be terminated
 by any of ``\n``, the Unix end-of-line convention, ``\r``, the
 Macintosh convention or ``\r\n``, the Windows convention.  All of
 these external representations are seen as ``\n`` by the Python
 program.  Note: This feature is only available if Python is
 built with universal newline support (the default).  Also, the
 newlines attribute of the file objects stdout, stdin and stderr
 are not updated by the ``communicate()`` method.

- The ``startupinfo`` and ``creationflags``, if given, will be passed to
 the underlying ``CreateProcess()`` function.  They can specify
 things such as appearance of the main window and priority for
 the new process.  (Windows only)


This module also defines two shortcut functions:

- ``call(*args, **kwargs)``:
    Run command with arguments.  Wait for command to complete,
    then return the ``returncode`` attribute.

    The arguments are the same as for the Popen constructor.
    Example::

        retcode = call(["ls", "-l"])


Exceptions
----------

Exceptions raised in the child process, before the new program has
started to execute, will be re-raised in the parent.
Additionally, the exception object will have one extra attribute
called 'child_traceback', which is a string containing traceback
information from the child's point of view.

The most common exception raised is ``OSError``.  This occurs, for
example, when trying to execute a non-existent file.  Applications
should prepare for ``OSErrors``.

A ``ValueError`` will be raised if Popen is called with invalid
arguments.


Security
--------

Unlike some other popen functions, this implementation will never
call /bin/sh implicitly.  This means that all characters,
including shell meta-characters, can safely be passed to child
processes.


Popen objects
-------------

Instances of the Popen class have the following methods:

``poll()``
  Check if child process has terminated.  Returns ``returncode``
  attribute.

``wait()``
  Wait for child process to terminate.  Returns ``returncode``
  attribute.

``communicate(input=None)``
  Interact with process: Send data to stdin.  Read data from
  stdout and stderr, until end-of-file is reached.  Wait for
  process to terminate.  The optional stdin argument should be a
  string to be sent to the child process, or ``None``, if no data
  should be sent to the child.

  ``communicate()`` returns a tuple ``(stdout, stderr)``.

  Note: The data read is buffered in memory, so do not use this
  method if the data size is large or unlimited.

The following attributes are also available:

``stdin``
  If the ``stdin`` argument is ``PIPE``, this attribute is a file object
  that provides input to the child process.  Otherwise, it is
  ``None``.

``stdout``
  If the ``stdout`` argument is ``PIPE``, this attribute is a file
  object that provides output from the child process.
  Otherwise, it is ``None``.

``stderr``
  If the ``stderr`` argument is ``PIPE``, this attribute is file object
  that provides error output from the child process.  Otherwise,
  it is ``None``.

``pid``
  The process ID of the child process.

``returncode``
   The child return code.  A ``None`` value indicates that the
   process hasn't terminated yet.  A negative value -N indicates
   that the child was terminated by signal N (UNIX only).


Replacing older functions with the subprocess module
====================================================

In this section, "a ==> b" means that b can be used as a
replacement for a.

Note: All functions in this section fail (more or less) silently
if the executed program cannot be found; this module raises an
OSError exception.

In the following examples, we assume that the subprocess module is
imported with ``from subprocess import *``.


Replacing /bin/sh shell backquote
---------------------------------
::

   output=`mycmd myarg`
   ==>
   output = Popen(["mycmd", "myarg"], stdout=PIPE).communicate()[0]


Replacing shell pipe line
-------------------------
::

   output=`dmesg | grep hda`
   ==>
   p1 = Popen(["dmesg"], stdout=PIPE)
   p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
   output = p2.communicate()[0]


Replacing ``os.system()``
-------------------------
::

   sts = os.system("mycmd" + " myarg")
   ==>
   p = Popen("mycmd" + " myarg", shell=True)
   sts = os.waitpid(p.pid, 0)

Note:

* Calling the program through the shell is usually not required.

* It's easier to look at the returncode attribute than the
 exit status.

A more real-world example would look like this::

   try:
       retcode = call("mycmd" + " myarg", shell=True)
       if retcode < 0:
           print >>sys.stderr, "Child was terminated by signal", -retcode
       else:
           print >>sys.stderr, "Child returned", retcode
   except OSError, e:
       print >>sys.stderr, "Execution failed:", e


Replacing ``os.spawn*``
-----------------------

P_NOWAIT example::

   pid = os.spawnlp(os.P_NOWAIT, "/bin/mycmd", "mycmd", "myarg")
   ==>
   pid = Popen(["/bin/mycmd", "myarg"]).pid


P_WAIT example::

   retcode = os.spawnlp(os.P_WAIT, "/bin/mycmd", "mycmd", "myarg")
   ==>
   retcode = call(["/bin/mycmd", "myarg"])


Vector example::

   os.spawnvp(os.P_NOWAIT, path, args)
   ==>
   Popen([path] + args[1:])


Environment example::

   os.spawnlpe(os.P_NOWAIT, "/bin/mycmd", "mycmd", "myarg", env)
   ==>
   Popen(["/bin/mycmd", "myarg"], env={"PATH": "/usr/bin"})


Replacing ``os.popen*``
-----------------------
::

   pipe = os.popen(cmd, mode='r', bufsize)
   ==>
   pipe = Popen(cmd, shell=True, bufsize=bufsize, stdout=PIPE).stdout

   pipe = os.popen(cmd, mode='w', bufsize)
   ==>
   pipe = Popen(cmd, shell=True, bufsize=bufsize, stdin=PIPE).stdin


   (child_stdin, child_stdout) = os.popen2(cmd, mode, bufsize)
   ==>
   p = Popen(cmd, shell=True, bufsize=bufsize,
             stdin=PIPE, stdout=PIPE, close_fds=True)
   (child_stdin, child_stdout) = (p.stdin, p.stdout)


   (child_stdin,
    child_stdout,
    child_stderr) = os.popen3(cmd, mode, bufsize)
   ==>
   p = Popen(cmd, shell=True, bufsize=bufsize,
             stdin=PIPE, stdout=PIPE, stderr=PIPE, close_fds=True)
   (child_stdin,
    child_stdout,
    child_stderr) = (p.stdin, p.stdout, p.stderr)


   (child_stdin, child_stdout_and_stderr) = os.popen4(cmd, mode, bufsize)
   ==>
   p = Popen(cmd, shell=True, bufsize=bufsize,
             stdin=PIPE, stdout=PIPE, stderr=STDOUT, close_fds=True)
   (child_stdin, child_stdout_and_stderr) = (p.stdin, p.stdout)


Replacing ``popen2.*``
----------------------

Note: If the cmd argument to ``popen2`` functions is a string, the
command is executed through /bin/sh.  If it is a list, the command
is directly executed.

::

   (child_stdout, child_stdin) = popen2.popen2("somestring", bufsize, mode)
   ==>
   p = Popen(["somestring"], shell=True, bufsize=bufsize
             stdin=PIPE, stdout=PIPE, close_fds=True)
   (child_stdout, child_stdin) = (p.stdout, p.stdin)


   (child_stdout, child_stdin) = popen2.popen2(["mycmd", "myarg"], bufsize, mode)
   ==>
   p = Popen(["mycmd", "myarg"], bufsize=bufsize,
             stdin=PIPE, stdout=PIPE, close_fds=True)
   (child_stdout, child_stdin) = (p.stdout, p.stdin)

The ``popen2.Popen3`` and ``popen3.Popen4`` basically works as
``subprocess.Popen``, except that:

* ``subprocess.Popen`` raises an exception if the execution fails
* the ``capturestderr`` argument is replaced with the stderr argument.
* ``stdin=PIPE`` and ``stdout=PIPE`` must be specified.
* ``popen2`` closes all file descriptors by default, but you have to
 specify ``close_fds=True`` with ``subprocess.Popen``.


Open Issues
===========

Some features have been requested but is not yet implemented.
This includes:

* Support for managing a whole flock of subprocesses

* Support for managing "daemon" processes

* Built-in method for killing subprocesses

While these are useful features, it's expected that these can be
added later without problems.

* expect-like functionality, including pty support.

pty support is highly platform-dependent, which is a
problem.  Also, there are already other modules that provide this
kind of functionality [6]_.


Backwards Compatibility
=======================

Since this is a new module, no major backward compatible issues
are expected.  The module name "subprocess" might collide with
other, previous modules [3]_ with the same name, but the name
"subprocess" seems to be the best suggested name so far.  The
first name of this module was "popen5", but this name was
considered too unintuitive.  For a while, the module was called
"process", but this name is already used by Trent Mick's
module [4]_.

The functions and modules that this new module is trying to
replace (``os.system``, ``os.spawn*``, ``os.popen*``, ``popen2.*``,
``commands.*``) are expected to be available in future Python versions
for a long time, to preserve backwards compatibility.


Reference Implementation
========================

A reference implementation is available from
http://www.lysator.liu.se/~astrand/popen5/.


References
==========

. [1] Secure Programming for Linux and Unix HOWTO, section 8.3.
      http://www.dwheeler.com/secure-programs/

. [2] Python Dialog
      http://pythondialog.sourceforge.net/

. [3] http://www.iol.ie/~padraiga/libs/subProcess.py

. [4] http://starship.python.net/crew/tmick/

. [5] http://starship.python.net/crew/mhammond/win32/

. [6] http://www.lysator.liu.se/~ceder/pcl-expect/


Copyright
=========

This document has been placed in the public domain.

.
  Local Variables:
  mode: indented-text
  indent-tabs-mode: nil
  sentence-end-double-space: t
  fill-column: 70
  End: