PEP: 517
Title: A build-system independent format for source trees
Version: $Revision$
Last-Modified: $Date$
Author: Nathaniel J. Smith <[email protected]>,
       Thomas Kluyver <[email protected]>
BDFL-Delegate: Alyssa Coghlan <[email protected]>
Discussions-To: [email protected]
Status: Final
Type: Standards Track
Topic: Packaging
Content-Type: text/x-rst
Created: 30-Sep-2015
Post-History: 01-Oct-2015, 25-Oct-2015, 19-May-2017, 11-Sep-2017
Resolution: https://mail.python.org/pipermail/distutils-sig/2017-September/031548.html

==========
Abstract
==========

While ``distutils`` / ``setuptools`` have taken us a long way, they
suffer from three serious problems: (a) they're missing important
features like usable build-time dependency declaration,
autoconfiguration, and even basic ergonomic niceties like `DRY
<https://en.wikipedia.org/wiki/Don%27t_repeat_yourself>`_-compliant
version number management, and (b) extending them is difficult, so
while there do exist various solutions to the above problems, they're
often quirky, fragile, and expensive to maintain, and yet (c) it's
very difficult to use anything else, because distutils/setuptools
provide the standard interface for installing packages expected by
both users and installation tools like ``pip``.

Previous efforts (e.g. distutils2 or setuptools itself) have attempted
to solve problems (a) and/or (b). This proposal aims to solve (c).

The goal of this PEP is get distutils-sig out of the business of being
a gatekeeper for Python build systems. If you want to use distutils,
great; if you want to use something else, then that should be easy to
do using standardized methods. The difficulty of interfacing with
distutils means that there aren't many such systems right now, but to
give a sense of what we're thinking about see `flit
<https://github.com/takluyver/flit>`_ or `bento
<https://cournape.github.io/Bento/>`_. Fortunately, wheels have now
solved many of the hard problems here -- e.g. it's no longer necessary
that a build system also know about every possible installation
configuration -- so pretty much all we really need from a build system
is that it have some way to spit out standard-compliant wheels and
sdists.

We therefore propose a new, relatively minimal interface for
installation tools like ``pip`` to interact with package source trees
and source distributions.


=======================
Terminology and goals
=======================

A *source tree* is something like a VCS checkout. We need a standard
interface for installing from this format, to support usages like
``pip install some-directory/``.

A *source distribution* is a static snapshot representing a particular
release of some source code, like ``lxml-3.4.4.tar.gz``. Source
distributions serve many purposes: they form an archival record of
releases, they provide a stupid-simple de facto standard for tools
that want to ingest and process large corpora of code, possibly
written in many languages (e.g. code search), they act as the input to
downstream packaging systems like Debian/Fedora/Conda/..., and so
forth. In the Python ecosystem they additionally have a particularly
important role to play, because packaging tools like ``pip`` are able
to use source distributions to fulfill binary dependencies, e.g. if
there is a distribution ``foo.whl`` which declares a dependency on
``bar``, then we need to support the case where ``pip install bar`` or
``pip install foo`` automatically locates the sdist for ``bar``,
downloads it, builds it, and installs the resulting package.

Source distributions are also known as *sdists* for short.

A *build frontend* is a tool that users might run that takes arbitrary
source trees or source distributions and builds wheels from them. The
actual building is done by each source tree's *build backend*. In a
command like ``pip wheel some-directory/``, pip is acting as a build
frontend.

An *integration frontend* is a tool that users might run that takes a
set of package requirements (e.g. a requirements.txt file) and
attempts to update a working environment to satisfy those
requirements. This may require locating, building, and installing a
combination of wheels and sdists. In a command like ``pip install
lxml==2.4.0``, pip is acting as an integration frontend.


==============
Source trees
==============

There is an existing, legacy source tree format involving
``setup.py``. We don't try to specify it further; its de facto
specification is encoded in the source code and documentation of
``distutils``, ``setuptools``, ``pip``, and other tools. We'll refer
to it as the ``setup.py``\-style.

Here we define a new style of source tree based around the
``pyproject.toml`` file defined in :pep:`518`, extending the
``[build-system]`` table in that file with one additional key,
``build-backend``. Here's an example of how it would look::

   [build-system]
   # Defined by PEP 518:
   requires = ["flit"]
   # Defined by this PEP:
   build-backend = "flit.api:main"

``build-backend`` is a string naming a Python object that will be
used to perform the build (see below for details). This is formatted
following the same ``module:object`` syntax as a ``setuptools`` entry
point. For instance, if the string is ``"flit.api:main"`` as in the
example above, this object would be looked up by executing the
equivalent of::

   import flit.api
   backend = flit.api.main

It's also legal to leave out the ``:object`` part, e.g. ::

   build-backend = "flit.api"

which acts like::

   import flit.api
   backend = flit.api

Formally, the string should satisfy this grammar::

   identifier = (letter | '_') (letter | '_' | digit)*
   module_path = identifier ('.' identifier)*
   object_path = identifier ('.' identifier)*
   entry_point = module_path (':' object_path)?

And we import ``module_path`` and then lookup
``module_path.object_path`` (or just ``module_path`` if
``object_path`` is missing).

When importing the module path, we do *not* look in the directory containing the
source tree, unless that would be on ``sys.path`` anyway (e.g. because it is
specified in PYTHONPATH). Although Python automatically adds the working
directory to ``sys.path`` in some situations, code to resolve the backend should
not be affected by this.

If the ``pyproject.toml`` file is absent, or the ``build-backend``
key is missing, the source tree is not using this specification, and
tools should revert to the legacy behaviour of running ``setup.py`` (either
directly, or by implicitly invoking the ``setuptools.build_meta:__legacy__``
backend).

Where the ``build-backend`` key exists, this takes precedence and the source tree follows the format and
conventions of the specified backend (as such no ``setup.py`` is needed unless the backend requires it).
Projects may still wish to include a ``setup.py`` for compatibility with tools that do not use this spec.

This PEP also defines a ``backend-path`` key for use in ``pyproject.toml``, see
the "In-Tree Build Backends" section below. This key would be used as follows::

   [build-system]
   # Defined by PEP 518:
   requires = ["flit"]
   # Defined by this PEP:
   build-backend = "local_backend"
   backend-path = ["backend"]


Build requirements
==================

This PEP places a number of additional requirements on the "build requirements"
section of ``pyproject.toml``. These are intended to ensure that projects do
not create impossible to satisfy conditions with their build requirements.

- Project build requirements will define a directed graph of requirements
 (project A needs B to build, B needs C and D, etc.) This graph MUST NOT
 contain cycles.  If (due to lack of co-ordination between projects, for
 example) a cycle is present, front ends MAY refuse to build the project.
- Where build requirements are available as wheels, front ends SHOULD use these
 where practical, to avoid deeply nested builds.  However front ends MAY have
 modes where they do not consider wheels when locating build requirements, and
 so projects MUST NOT assume that publishing wheels is sufficient to break a
 requirement cycle.
- Front ends SHOULD check explicitly for requirement cycles, and terminate
 the build with an informative message if one is found.

Note in particular that the requirement for no requirement cycles means that
backends wishing to self-host (i.e., building a wheel for a backend uses that
backend for the build) need to make special provision to avoid causing cycles.
Typically this will involve specifying themselves as an in-tree backend, and
avoiding external build dependencies (usually by vendoring them).


=========================
Build backend interface
=========================

The build backend object is expected to have attributes which provide
some or all of the following hooks. The common ``config_settings``
argument is described after the individual hooks.

Mandatory hooks
===============

build_wheel
-----------

::

   def build_wheel(wheel_directory, config_settings=None, metadata_directory=None):
       ...

Must build a .whl file, and place it in the specified ``wheel_directory``. It
must return the basename (not the full path) of the ``.whl`` file it creates,
as a unicode string.

If the build frontend has previously called ``prepare_metadata_for_build_wheel``
and depends on the wheel resulting from this call to have metadata
matching this earlier call, then it should provide the path to the created
``.dist-info`` directory as the ``metadata_directory`` argument. If this
argument is provided, then ``build_wheel`` MUST produce a wheel with identical
metadata. The directory passed in by the build frontend MUST be
identical to the directory created by ``prepare_metadata_for_build_wheel``,
including any unrecognized files it created.

Backends which do not provide the ``prepare_metadata_for_build_wheel`` hook may
either silently ignore the ``metadata_directory`` parameter to ``build_wheel``,
or else raise an exception when it is set to anything other than ``None``.

To ensure that wheels from different sources are built the same way, frontends
may call ``build_sdist`` first, and then call ``build_wheel`` in the unpacked
sdist. But if the backend indicates that it is missing some requirements for
creating an sdist (see below), the frontend will fall back to calling
``build_wheel`` in the source directory.

The source directory may be read-only. Backends should therefore be
prepared to build without creating or modifying any files in the source
directory, but they may opt not to handle this case, in which case
failures will be visible to the user. Frontends are not responsible for
any special handling of read-only source directories.

The backend may store intermediate artifacts in cache locations or
temporary directories. The presence or absence of any caches should not
make a material difference to the final result of the build.

build_sdist
-----------

::

   def build_sdist(sdist_directory, config_settings=None):
       ...

Must build a .tar.gz source distribution and place it in the specified
``sdist_directory``. It must return the basename (not the full path) of the
``.tar.gz`` file it creates, as a unicode string.

A .tar.gz source distribution (sdist) contains a single top-level directory called
``{name}-{version}`` (e.g. ``foo-1.0``), containing the source files of the
package. This directory must also contain the
``pyproject.toml`` from the build directory, and a PKG-INFO file containing
metadata in the format described in
:pep:`345`. Although historically
zip files have also been used as sdists, this hook should produce a gzipped
tarball. This is already the more common format for sdists, and having a
consistent format makes for simpler tooling.

The generated tarball should use the modern POSIX.1-2001 pax tar format, which
specifies UTF-8 based file names. This is not yet the default for the tarfile
module shipped with Python 3.6, so backends using the tarfile module need to
explicitly pass ``format=tarfile.PAX_FORMAT``.

Some backends may have extra requirements for creating sdists, such as version
control tools. However, some frontends may prefer to make intermediate sdists
when producing wheels, to ensure consistency.
If the backend cannot produce an sdist because a dependency is missing, or
for another well understood reason, it should raise an exception of a specific
type which it makes available as ``UnsupportedOperation`` on the backend object.
If the frontend gets this exception while building an sdist as an intermediate
for a wheel, it should fall back to building a wheel directly.
The backend does not need to define this exception type if it would never raise
it.

Optional hooks
==============

get_requires_for_build_wheel
----------------------------

::

 def get_requires_for_build_wheel(config_settings=None):
     ...

This hook MUST return an additional list of strings containing :pep:`508`
dependency specifications, above and beyond those specified in the
``pyproject.toml`` file, to be installed when calling the ``build_wheel`` or
``prepare_metadata_for_build_wheel`` hooks.

Example::

 def get_requires_for_build_wheel(config_settings):
     return ["wheel >= 0.25", "setuptools"]

If not defined, the default implementation is equivalent to ``return []``.

prepare_metadata_for_build_wheel
--------------------------------

::

 def prepare_metadata_for_build_wheel(metadata_directory, config_settings=None):
     ...

Must create a ``.dist-info`` directory containing wheel metadata
inside the specified ``metadata_directory`` (i.e., creates a directory
like ``{metadata_directory}/{package}-{version}.dist-info/``). This
directory MUST be a valid ``.dist-info`` directory as defined in the
wheel specification, except that it need not contain ``RECORD`` or
signatures. The hook MAY also create other files inside this
directory, and a build frontend MUST preserve, but otherwise ignore, such files;
the intention
here is that in cases where the metadata depends on build-time
decisions, the build backend may need to record these decisions in
some convenient format for re-use by the actual wheel-building step.

This must return the basename (not the full path) of the ``.dist-info``
directory it creates, as a unicode string.

If a build frontend needs this information and the method is
not defined, it should call ``build_wheel`` and look at the resulting
metadata directly.

get_requires_for_build_sdist
----------------------------

::

 def get_requires_for_build_sdist(config_settings=None):
     ...

This hook MUST return an additional list of strings containing :pep:`508`
dependency specifications, above and beyond those specified in the
``pyproject.toml`` file. These dependencies will be installed when calling the
``build_sdist`` hook.

If not defined, the default implementation is equivalent to ``return []``.


. note:: Editable installs

  This PEP originally specified another hook, ``install_editable``, to do an
  editable install (as with ``pip install -e``). It was removed due to the
  complexity of the topic, but may be specified in a later PEP.

  Briefly, the questions to be answered include: what reasonable ways existing
  of implementing an 'editable install'? Should the backend or the frontend
  pick how to make an editable install? And if the frontend does, what does it
  need from the backend to do so.

Config settings
===============

::

 config_settings

This argument, which is passed to all hooks, is an arbitrary
dictionary provided as an "escape hatch" for users to pass ad-hoc
configuration into individual package builds. Build backends MAY
assign any semantics they like to this dictionary. Build frontends
SHOULD provide some mechanism for users to specify arbitrary
string-key/string-value pairs to be placed in this dictionary.
For example, they might support some syntax like ``--package-config CC=gcc``.
In case a user provides duplicate string-keys, build frontends SHOULD
combine the corresponding string-values into a list of strings.
Build frontends MAY also provide arbitrary other mechanisms
for users to place entries in this dictionary. For example, ``pip``
might choose to map a mix of modern and legacy command line arguments
like::

 pip install                                           \
   --package-config CC=gcc                             \
   --global-option="--some-global-option"              \
   --build-option="--build-option1"                    \
   --build-option="--build-option2"

into a ``config_settings`` dictionary like::

 {
  "CC": "gcc",
  "--global-option": ["--some-global-option"],
  "--build-option": ["--build-option1", "--build-option2"],
 }

Of course, it's up to users to make sure that they pass options which
make sense for the particular build backend and package that they are
building.

The hooks may be called with positional or keyword arguments, so backends
implementing them should be careful to make sure that their signatures match
both the order and the names of the arguments above.

All hooks are run with working directory set to the root of the source
tree, and MAY print arbitrary informational text on stdout and
stderr. They MUST NOT read from stdin, and the build frontend MAY
close stdin before invoking the hooks.

The build frontend may capture stdout and/or stderr from the backend. If the
backend detects that an output stream is not a terminal/console (e.g.
``not sys.stdout.isatty()``), it SHOULD ensure that any output it writes to that
stream is UTF-8 encoded. The build frontend MUST NOT fail if captured output is
not valid UTF-8, but it MAY not preserve all the information in that case (e.g.
it may decode using the *replace* error handler in Python). If the output stream
is a terminal, the build backend is responsible for presenting its output
accurately, as for any program running in a terminal.

If a hook raises an exception, or causes the process to terminate,
then this indicates an error.


Build environment
=================

One of the responsibilities of a build frontend is to set up the
Python environment in which the build backend will run.

We do not require that any particular "virtual environment" mechanism
be used; a build frontend might use virtualenv, or venv, or no special
mechanism at all. But whatever mechanism is used MUST meet the
following criteria:

- All requirements specified by the project's build-requirements must
 be available for import from Python. In particular:

 - The ``get_requires_for_build_wheel`` and ``get_requires_for_build_sdist`` hooks are
   executed in an environment which contains the bootstrap requirements
   specified in the ``pyproject.toml`` file.

 - The ``prepare_metadata_for_build_wheel`` and ``build_wheel`` hooks are
   executed in an environment which contains the
   bootstrap requirements from ``pyproject.toml`` and those specified by the
   ``get_requires_for_build_wheel`` hook.

 - The ``build_sdist`` hook is executed in an environment which contains the
   bootstrap requirements from ``pyproject.toml`` and those specified by the
   ``get_requires_for_build_sdist`` hook.

- This must remain true even for new Python subprocesses spawned by
 the build environment, e.g. code like::

   import sys, subprocess
   subprocess.check_call([sys.executable, ...])

 must spawn a Python process which has access to all the project's
 build-requirements. This is necessary e.g. for build backends that
 want to run legacy ``setup.py`` scripts in a subprocess.

- All command-line scripts provided by the build-required packages
 must be present in the build environment's PATH. For example, if a
 project declares a build-requirement on `flit
 <https://flit.readthedocs.org/en/latest/>`__, then the following must
 work as a mechanism for running the flit command-line tool::

   import subprocess
   import shutil
   subprocess.check_call([shutil.which("flit"), ...])

A build backend MUST be prepared to function in any environment which
meets the above criteria. In particular, it MUST NOT assume that it
has access to any packages except those that are present in the
stdlib, or that are explicitly declared as build-requirements.

Frontends should call each hook in a fresh subprocess, so that backends are
free to change process global state (such as environment variables or the
working directory). A Python library will be provided which frontends can use
to easily call hooks this way.

Recommendations for build frontends (non-normative)
---------------------------------------------------

A build frontend MAY use any mechanism for setting up a build
environment that meets the above criteria. For example, simply
installing all build-requirements into the global environment would be
sufficient to build any compliant package -- but this would be
sub-optimal for a number of reasons. This section contains
non-normative advice to frontend implementors.

A build frontend SHOULD, by default, create an isolated environment
for each build, containing only the standard library and any
explicitly requested build-dependencies. This has two benefits:

- It allows for a single installation run to build multiple packages
 that have contradictory build-requirements. E.g. if package1
 build-requires pbr==1.8.1, and package2 build-requires pbr==1.7.2,
 then these cannot both be installed simultaneously into the global
 environment -- which is a problem when the user requests ``pip
 install package1 package2``. Or if the user already has pbr==1.8.1
 installed in their global environment, and a package build-requires
 pbr==1.7.2, then downgrading the user's version would be rather
 rude.

- It acts as a kind of public health measure to maximize the number of
 packages that actually do declare accurate build-dependencies. We
 can write all the strongly worded admonitions to package authors we
 want, but if build frontends don't enforce isolation by default,
 then we'll inevitably end up with lots of packages on PyPI that
 build fine on the original author's machine and nowhere else, which
 is a headache that no-one needs.

However, there will also be situations where build-requirements are
problematic in various ways. For example, a package author might
accidentally leave off some crucial requirement despite our best
efforts; or, a package might declare a build-requirement on ``foo >=
1.0`` which worked great when 1.0 was the latest version, but now 1.1
is out and it has a showstopper bug; or, the user might decide to
build a package against numpy==1.7 -- overriding the package's
preferred numpy==1.8 -- to guarantee that the resulting build will be
compatible at the C ABI level with an older version of numpy (even if
this means the resulting build is unsupported upstream). Therefore,
build frontends SHOULD provide some mechanism for users to override
the above defaults. For example, a build frontend could have a
``--build-with-system-site-packages`` option that causes the
``--system-site-packages`` option to be passed to
virtualenv-or-equivalent when creating build environments, or a
``--build-requirements-override=my-requirements.txt`` option that
overrides the project's normal build-requirements.

The general principle here is that we want to enforce hygiene on
package *authors*, while still allowing *end-users* to open up the
hood and apply duct tape when necessary.


In-tree build backends
======================

In certain circumstances, projects may wish to include the source code for the
build backend directly in the source tree, rather than referencing the backend
via the ``requires`` key. Two specific situations where this would be expected
are:

- Backends themselves, which want to use their own features for building
 themselves ("self-hosting backends")
- Project-specific backends, typically consisting of a custom wrapper around a
 standard backend, where the wrapper is too project-specific to be worth
 distributing independently ("in-tree backends")

Projects can specify that their backend code is hosted in-tree by including the
``backend-path`` key in ``pyproject.toml``. This key contains a list of
directories, which the frontend will add to the start of ``sys.path`` when
loading the backend, and running the backend hooks.

There are two restrictions on the content of the ``backend-path`` key:

- Directories in ``backend-path`` are interpreted as relative to the project
 root, and MUST refer to a location within the source tree (after relative
 paths and symbolic links have been resolved).
- The backend code MUST be loaded from one of the directories specified in
 ``backend-path`` (i.e., it is not permitted to specify ``backend-path`` and
 *not* have in-tree backend code).

The first restriction is to ensure that source trees remain self-contained,
and cannot refer to locations outside of the source tree. Frontends SHOULD
check this condition (typically by resolving the location to an absolute path
and resolving symbolic links, and then checking it against the project root),
and fail with an error message if it is violated.

The ``backend-path`` feature is intended to support the implementation of
in-tree backends, and not to allow configuration of existing backends. The
second restriction above is specifically to ensure that this is how the feature
is used. Front ends MAY enforce this check, but are not required to. Doing so
would typically involve checking the backend's ``__file__`` attribute against
the locations in ``backend-path``.


======================
Source distributions
======================

We continue with the legacy sdist format, adding some new restrictions.
This format is mostly
undefined, but basically comes down to: a file named
``{NAME}-{VERSION}.{EXT}``, which unpacks into a buildable source tree
called ``{NAME}-{VERSION}/``. Traditionally these have always
contained ``setup.py``\-style source trees; we now allow them to also
contain ``pyproject.toml``\-style source trees.

Integration frontends require that an sdist named
``{NAME}-{VERSION}.{EXT}`` will generate a wheel named
``{NAME}-{VERSION}-{COMPAT-INFO}.whl``.

The new restrictions for sdists built by :pep:`517` backends are:

- They will be gzipped tar archives, with the ``.tar.gz`` extension. Zip
 archives, or other compression formats for tarballs, are not allowed at
 present.
- Tar archives must be created in the modern POSIX.1-2001 pax tar format, which
 uses UTF-8 for file names.
- The source tree contained in an sdist is expected to include the
 ``pyproject.toml`` file.

====================
Evolutionary notes
====================

A goal here is to make it as simple as possible to convert old-style
sdists to new-style sdists. (E.g., this is one motivation for
supporting dynamic build requirements.) The ideal would be that there
would be a single static ``pyproject.toml`` that could be dropped into any
"version 0" VCS checkout to convert it to the new shiny. This is
probably not 100% possible, but we can get close, and it's important
to keep track of how close we are... hence this section.

A rough plan would be: Create a build system package
(``setuptools_pypackage`` or whatever) that knows how to speak
whatever hook language we come up with, and convert them into calls to
``setup.py``. This will probably require some sort of hooking or
monkeypatching to setuptools to provide a way to extract the
``setup_requires=`` argument when needed, and to provide a new version
of the sdist command that generates the new-style format. This all
seems doable and sufficient for a large proportion of packages (though
obviously we'll want to prototype such a system before we finalize
anything here). (Alternatively, these changes could be made to
setuptools itself rather than going into a separate package.)

But there remain two obstacles that mean we probably won't be able to
automatically upgrade packages to the new format:

1) There currently exist packages which insist on particular packages
  being available in their environment before setup.py is
  executed. This means that if we decide to execute build scripts in
  an isolated virtualenv-like environment, then projects will need to
  check whether they do this, and if so then when upgrading to the
  new system they will have to start explicitly declaring these
  dependencies (either via ``setup_requires=`` or via static
  declaration in ``pyproject.toml``).

2) There currently exist packages which do not declare consistent
  metadata (e.g. ``egg_info`` and ``bdist_wheel`` might get different
  ``install_requires=``). When upgrading to the new system, projects
  will have to evaluate whether this applies to them, and if so they
  will need to stop doing that.


==================
Rejected options
==================

* We discussed making the wheel and sdist hooks build unpacked directories
 containing the same contents as their respective archives. In some cases this
 could avoid the need to pack and unpack an archive, but this seems like
 premature optimisation. It's advantageous for tools to work with archives
 as the canonical interchange formats (especially for wheels, where the archive
 format is already standardised). Close control of archive creation is
 important for reproducible builds. And it's not clear that tasks requiring an
 unpacked distribution will be more common than those requiring an archive.
* We considered an extra hook to copy files to a build directory before invoking
 ``build_wheel``. Looking at existing build systems, we found that passing
 a build directory into ``build_wheel`` made more sense for many tools than
 pre-emptively copying files into a build directory.
* The idea of passing ``build_wheel`` a build directory was then also deemed an
 unnecessary complication. Build tools can use a temporary directory or a cache
 directory to store intermediate files while building. If there is a need, a
 frontend-controlled cache directory could be added in the future.
* For ``build_sdist`` to signal a failure for an expected reason, various
 options were debated at great length, including raising
 ``NotImplementedError`` and returning either ``NotImplemented`` or ``None``.
 Please do not attempt to reopen this discussion without an *extremely* good
 reason, because we are quite tired of it.
* Allowing the backend to be imported from files in the source tree would be
 more consistent with the way Python imports often work. However, not allowing
 this prevents confusing errors from clashing module names. The initial
 version of this PEP did not provide a means to allow backends to be
 imported from files within the source tree, but the ``backend-path`` key
 was added in the next revision to allow projects to opt into this behaviour
 if needed.


===============================
Summary of changes to PEP 517
===============================

The following changes were made to this PEP after the initial reference
implementation was released in pip 19.0.

* Cycles in build requirements were explicitly prohibited.
* Support for in-tree backends and self-hosting of backends was added by
 the introduction of the ``backend-path`` key in the ``[build-system]``
 table.
* Clarified that the ``setuptools.build_meta:__legacy__`` :pep:`517` backend is
 an acceptable alternative to directly invoking ``setup.py`` for source trees
 that don't specify ``build-backend`` explicitly.


===================================
Appendix A: Comparison to PEP 516
===================================

:pep:`516` is a competing proposal to specify a build system interface, which
has now been rejected in favour of this PEP. The primary difference is
that our build backend is defined via a Python hook-based interface
rather than a command-line based interface.

This appendix documents the arguments advanced for this PEP over :pep:`516`.

We do *not* expect that specifying Python hooks rather than command line
interfaces will, by itself, reduce the
complexity of calling into the backend, because build frontends will
in any case want to run hooks inside a child -- this is important to
isolate the build frontend itself from the backend code and to better
control the build backends execution environment. So under both
proposals, there will need to be some code in ``pip`` to spawn a
subprocess and talk to some kind of command-line/IPC interface, and
there will need to be some code in the subprocess that knows how to
parse these command line arguments and call the actual build backend
implementation. So this diagram applies to all proposals equally::

 +-----------+          +---------------+           +----------------+
 | frontend  | -spawn-> | child cmdline | -Python-> |    backend     |
 |   (pip)   |          |   interface   |           | implementation |
 +-----------+          +---------------+           +----------------+



The key difference between the two approaches is how these interface
boundaries map onto project structure::

 .-= This PEP =-.

 +-----------+          +---------------+    |      +----------------+
 | frontend  | -spawn-> | child cmdline | -Python-> |    backend     |
 |   (pip)   |          |   interface   |    |      | implementation |
 +-----------+          +---------------+    |      +----------------+
                                             |
 |______________________________________|    |
    Owned by pip, updated in lockstep        |
                                             |
                                             |
                                  PEP-defined interface boundary
                                Changes here require distutils-sig


 .-= Alternative =-.

 +-----------+    |     +---------------+           +----------------+
 | frontend  | -spawn-> | child cmdline | -Python-> |    backend     |
 |   (pip)   |    |     |   interface   |           | implementation |
 +-----------+    |     +---------------+           +----------------+
                  |
                  |     |____________________________________________|
                  |      Owned by build backend, updated in lockstep
                  |
     PEP-defined interface boundary
   Changes here require distutils-sig


By moving the PEP-defined interface boundary into Python code, we gain
three key advantages.

**First**, because there will likely be only a small number of build
frontends (``pip``, and... maybe a few others?), while there will
likely be a long tail of custom build backends (since these are chosen
separately by each package to match their particular build
requirements), the actual diagrams probably look more like::

 .-= This PEP =-.

 +-----------+          +---------------+           +----------------+
 | frontend  | -spawn-> | child cmdline | -Python+> |    backend     |
 |   (pip)   |          |   interface   |        |  | implementation |
 +-----------+          +---------------+        |  +----------------+
                                                 |
                                                 |  +----------------+
                                                 +> |    backend     |
                                                 |  | implementation |
                                                 |  +----------------+
                                                 :
                                                 :

 .-= Alternative =-.

 +-----------+          +---------------+           +----------------+
 | frontend  | -spawn+> | child cmdline | -Python-> |    backend     |
 |   (pip)   |       |  |   interface   |           | implementation |
 +-----------+       |  +---------------+           +----------------+
                     |
                     |  +---------------+           +----------------+
                     +> | child cmdline | -Python-> |    backend     |
                     |  |   interface   |           | implementation |
                     |  +---------------+           +----------------+
                     :
                     :

That is, this PEP leads to less total code in the overall
ecosystem. And in particular, it reduces the barrier to entry of
making a new build system. For example, this is a complete, working
build backend::

   # mypackage_custom_build_backend.py
   import os.path
   import pathlib
   import shutil
   import tarfile

   SDIST_NAME = "mypackage-0.1"
   SDIST_FILENAME = SDIST_NAME + ".tar.gz"
   WHEEL_FILENAME = "mypackage-0.1-py2.py3-none-any.whl"

   #################
   # sdist creation
   #################

   def _exclude_hidden_and_special_files(archive_entry):
       """Tarfile filter to exclude hidden and special files from the archive"""
       if archive_entry.isfile() or archive_entry.isdir():
           if not os.path.basename(archive_entry.name).startswith("."):
               return archive_entry

   def _make_sdist(sdist_dir):
       """Make an sdist and return both the Python object and its filename"""
       sdist_path = pathlib.Path(sdist_dir) / SDIST_FILENAME
       sdist = tarfile.open(sdist_path, "w:gz", format=tarfile.PAX_FORMAT)
       # Tar up the whole directory, minus hidden and special files
       sdist.add(os.getcwd(), arcname=SDIST_NAME,
                 filter=_exclude_hidden_and_special_files)
       return sdist, SDIST_FILENAME

   def build_sdist(sdist_dir, config_settings):
       """PEP 517 sdist creation hook"""
       sdist, sdist_filename = _make_sdist(sdist_dir)
       return sdist_filename

   #################
   # wheel creation
   #################

   def get_requires_for_build_wheel(config_settings):
       """PEP 517 wheel building dependency definition hook"""
       # As a simple static requirement, this could also just be
       # listed in the project's build system dependencies instead
       return ["wheel"]

   def build_wheel(wheel_directory,
                   metadata_directory=None, config_settings=None):
       """PEP 517 wheel creation hook"""
       from wheel.archive import archive_wheelfile
       path = os.path.join(wheel_directory, WHEEL_FILENAME)
       archive_wheelfile(path, "src/")
       return WHEEL_FILENAME

Of course, this is a *terrible* build backend: it requires the user to
have manually set up the wheel metadata in
``src/mypackage-0.1.dist-info/``; when the version number changes it
must be manually updated in multiple places... but it works, and more features
could be added incrementally. Much experience suggests that large successful
projects often originate as quick hacks (e.g., Linux -- "just a hobby,
won't be big and professional"; `IPython/Jupyter
<https://en.wikipedia.org/wiki/IPython#Grants_and_awards>`_ -- `a grad
student's $PYTHONSTARTUP file
<http://blog.fperez.org/2012/01/ipython-notebook-historical.html>`_),
so if our goal is to encourage the growth of a vibrant ecosystem of
good build tools, it's important to minimize the barrier to entry.


**Second**, because Python provides a simpler yet richer structure for
describing interfaces, we remove unnecessary complexity from the
specification -- and specifications are the worst place for
complexity, because changing specifications requires painful
consensus-building across many stakeholders. In the command-line
interface approach, we have to come up with ad hoc ways to map
multiple different kinds of inputs into a single linear command line
(e.g. how do we avoid collisions between user-specified configuration
arguments and PEP-defined arguments? how do we specify optional
arguments? when working with a Python interface these questions have
simple, obvious answers). When spawning and managing subprocesses,
there are many fiddly details that must be gotten right, subtle
cross-platform differences, and some of the most obvious approaches --
e.g., using stdout to return data for the ``build_requires`` operation
-- can create unexpected pitfalls (e.g., what happens when computing
the build requirements requires spawning some child processes, and
these children occasionally print an error message to stdout?
obviously a careful build backend author can avoid this problem, but
the most obvious way of defining a Python interface removes this
possibility entirely, because the hook return value is clearly
demarcated).

In general, the need to isolate build backends into their own process
means that we can't remove IPC complexity entirely -- but by placing
both sides of the IPC channel under the control of a single project,
we make it much cheaper to fix bugs in the IPC interface than if
fixing bugs requires coordinated agreement and coordinated changes
across the ecosystem.

**Third**, and most crucially, the Python hook approach gives us much
more powerful options for evolving this specification in the future.

For concreteness, imagine that next year we add a new
``build_sdist_from_vcs`` hook, which provides an alternative to the current
``build_sdist`` hook where the frontend is responsible for passing
version control tracking metadata to backends (including indicating when all
on disk files are tracked), rather than individual backends having to query that
information themselves. In order to manage the transition, we'd want it to be
possible for build frontends to transparently use ``build_sdist_from_vcs`` when
available and fall back onto ``build_sdist`` otherwise; and we'd want it to be
possible for build backends to define both methods, for compatibility
with both old and new build frontends.

Furthermore, our mechanism should also fulfill two more goals: (a) If
new versions of e.g. ``pip`` and ``flit`` are both updated to support
the new interface, then this should be sufficient for it to be used;
in particular, it should *not* be necessary for every project that
*uses* ``flit`` to update its individual ``pyproject.toml`` file. (b)
We do not want to have to spawn extra processes just to perform this
negotiation, because process spawns can easily become a bottleneck when
deploying large multi-package stacks on some platforms (Windows).

In the interface described here, all of these goals are easy to
achieve. Because ``pip`` controls the code that runs inside the child
process, it can easily write it to do something like::

   command, backend, args = parse_command_line_args(...)
   if command == "build_sdist":
      if hasattr(backend, "build_sdist_from_vcs"):
          backend.build_sdist_from_vcs(...)
      elif hasattr(backend, "build_sdist"):
          backend.build_sdist(...)
      else:
          # error handling

In the alternative where the public interface boundary is placed at
the subprocess call, this is not possible -- either we need to spawn
an extra process just to query what interfaces are supported (as was
included in an earlier draft of :pep:`516`, an alternative to this), or
else we give up on autonegotiation entirely (as in the current version
of that PEP), meaning that any changes in the interface will require
N individual packages to update their ``pyproject.toml`` files before
any change can go live, and that any changes will necessarily be
restricted to new releases.

One specific consequence of this is that in this PEP, we're able to
make the ``prepare_metadata_for_build_wheel`` command optional. In our design,
this can be readily handled by build frontends, which can put code in
their subprocess runner like::

   def dump_wheel_metadata(backend, working_dir):
       """Dumps wheel metadata to working directory.

          Returns absolute path to resulting metadata directory
       """
       if hasattr(backend, "prepare_metadata_for_build_wheel"):
           subdir = backend.prepare_metadata_for_build_wheel(working_dir)
       else:
           wheel_fname = backend.build_wheel(working_dir)
           already_built = os.path.join(working_dir, "ALREADY_BUILT_WHEEL")
           with open(already_built, "w") as f:
               f.write(wheel_fname)
           subdir = unzip_metadata(os.path.join(working_dir, wheel_fname))
       return os.path.join(working_dir, subdir)

   def ensure_wheel_is_built(backend, output_dir, working_dir, metadata_dir):
       """Ensures built wheel is available in output directory

          Returns absolute path to resulting wheel file
       """
       already_built = os.path.join(working_dir, "ALREADY_BUILT_WHEEL")
       if os.path.exists(already_built):
           with open(already_built, "r") as f:
               wheel_fname = f.read().strip()
           working_path = os.path.join(working_dir, wheel_fname)
           final_path = os.path.join(output_dir, wheel_fname)
           os.rename(working_path, final_path)
           os.remove(already_built)
       else:
           wheel_fname = backend.build_wheel(output_dir, metadata_dir=metadata_dir)
       return os.path.join(output_dir, wheel_fname)

and thus expose a totally uniform interface to the rest of the frontend,
with no extra subprocess calls, no duplicated builds, etc. But
obviously this is the kind of code that you only want to write as part
of a private, within-project interface (e.g. the given example requires that
the working directory be shared between the two calls, but not with any
other wheel builds, and that the return value from the metadata helper function
will be passed back in to the wheel building one).

(And, of course, making the ``metadata`` command optional is one piece
of lowering the barrier to entry for developing new backends, as discussed
above.)


Other differences
=================

Besides the key command line versus Python hook difference described
above, there are a few other differences in this proposal:

* Metadata command is optional (as described above).

* We return metadata as a directory, rather than a single METADATA
 file. This aligns better with the way that in practice wheel metadata
 is distributed across multiple files (e.g. entry points), and gives us
 more options in the future. (For example, instead of following the PEP
 426 proposal of switching the format of METADATA to JSON, we might
 decide to keep the existing METADATA the way it is for backcompat,
 while adding new extensions as JSON "sidecar" files inside the same
 directory. Or maybe not; the point is it keeps our options more open.)

* We provide a mechanism for passing information between the metadata
 step and the wheel building step. I guess everyone probably will
 agree this is a good idea?

* We provide more detailed recommendations about the build environment,
 but these aren't normative anyway.


===========
Copyright
===========

This document has been placed in the public domain.


.
  Local Variables:
  mode: indented-text
  indent-tabs-mode: nil
  sentence-end-double-space: t
  fill-column: 70
  coding: utf-8
  End: