PEP: 467
Title: Minor API improvements for binary sequences
Version: $Revision$
Last-Modified: $Date$
Author: Alyssa Coghlan <[email protected]>, Ethan Furman <[email protected]>
Discussions-To: https://discuss.python.org/t/42001
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 30-Mar-2014
Python-Version: 3.13
Post-History: 30-Mar-2014, 15-Aug-2014, 16-Aug-2014, 07-Jun-2016, 01-Sep-2016,
             13-Apr-2021, 03-Nov-2021, 27-Dec-2023


Abstract
========

This PEP proposes small adjustments to the APIs of the ``bytes`` and
``bytearray`` types to make it easier to operate entirely in the binary domain:

* Add ``fromsize`` alternative constructor
* Add ``fromint`` alternative constructor
* Add ``getbyte`` byte retrieval method
* Add ``iterbytes`` alternative iterator

Rationale
=========

During the initial development of the Python 3 language specification, the
core ``bytes`` type for arbitrary binary data started as the mutable type
that is now referred to as ``bytearray``. Other aspects of operating in
the binary domain in Python have also evolved over the course of the Python
3 series, for example with :pep:`461`.


Motivation
==========

With Python 3 and the split between ``str`` and ``bytes``, one small but
important area of programming became slightly more difficult, and much more
painful -- wire format protocols.

This area of programming is characterized by a mixture of binary data and
ASCII compatible segments of text (aka ASCII-encoded text).  The addition of
the new constructors, methods, and iterators will aid both in writing new
wire format code, and in porting any remaining Python 2 wire format code.

Common use-cases include ``dbf`` and ``pdf`` file formats, ``email``
formats, and ``FTP`` and ``HTTP`` communications, among many others.


Proposals
=========

Addition of explicit "count and byte initialised sequence" constructors
-----------------------------------------------------------------------

To replace the discouraged behavior of creating zero-filled ``bytes``-like
objects from the basic constructors (i.e. ``bytes(1)`` --> ``b'\x00'``), this
PEP proposes the addition of an explicit ``fromsize`` alternative constructor
as a class method on both ``bytes`` and ``bytearray`` whose first argument
is the count, and whose second argument is the fill byte to use (defaults
to ``\x00``)::

   >>> bytes.fromsize(3)
   b'\x00\x00\x00'
   >>> bytearray.fromsize(3)
   bytearray(b'\x00\x00\x00')
   >>> bytes.fromsize(5, b'\x0a')
   b'\x0a\x0a\x0a\x0a\x0a'
   >>> bytearray.fromsize(5, fill=b'\x0a')
   bytearray(b'\x0a\x0a\x0a\x0a\x0a')

``fromsize`` will behave just as the current constructors behave when passed a
single integer, while allowing for non-zero fill values when needed.


Addition of explicit "single byte" constructors
-----------------------------------------------

As binary counterparts to the text ``chr`` function, this PEP proposes
the addition of an explicit ``fromint`` alternative constructor as a class
method on both ``bytes`` and ``bytearray``::

   >>> bytes.fromint(65)
   b'A'
   >>> bytearray.fromint(65)
   bytearray(b'A')

These methods will only accept integers in the range 0 to 255 (inclusive)::

   >>> bytes.fromint(512)
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
   ValueError: integer must be in range(0, 256)

   >>> bytes.fromint(1.0)
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
   TypeError: 'float' object cannot be interpreted as an integer

The documentation of the ``ord`` builtin will be updated to explicitly note
that ``bytes.fromint`` is the primary inverse operation for binary data, while
``chr`` is the inverse operation for text data, and that ``bytearray.fromint``
also exists.

Behaviorally, ``bytes.fromint(x)`` will be equivalent to the current
``bytes([x])`` (and similarly for ``bytearray``). The new spelling is
expected to be easier to discover and easier to read (especially when used
in conjunction with indexing operations on binary sequence types).

As a separate method, the new spelling will also work better with higher
order functions like ``map``.

These new methods intentionally do NOT offer the same level of general integer
support as the existing ``int.to_bytes`` conversion method, which allows
arbitrarily large integers to be converted to arbitrarily long bytes objects. The
restriction to only accept positive integers that fit in a single byte means
that no byte order information is needed, and there is no need to handle
negative numbers. The documentation of the new methods will refer readers to
``int.to_bytes`` for use cases where handling of arbitrary integers is needed.


Addition of "getbyte" method to retrieve a single byte
------------------------------------------------------

This PEP proposes that ``bytes`` and ``bytearray`` gain the method ``getbyte``
which will always return ``bytes``::

   >>> b'abc'.getbyte(0)
   b'a'

If an index is asked for that doesn't exist, ``IndexError`` is raised::

   >>> b'abc'.getbyte(9)
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
   IndexError: index out of range


Addition of optimised iterator methods that produce ``bytes`` objects
---------------------------------------------------------------------

This PEP proposes that ``bytes`` and ``bytearray`` gain an optimised
``iterbytes`` method that produces length 1 ``bytes`` objects rather than
integers::

   for x in data.iterbytes():
       # x is a length 1 ``bytes`` object, rather than an integer

For example::

   >>> tuple(b"ABC".iterbytes())
   (b'A', b'B', b'C')


Design discussion
=================

Why not rely on sequence repetition to create zero-initialised sequences?
-------------------------------------------------------------------------

Zero-initialised sequences can be created via sequence repetition::

   >>> b'\x00' * 3
   b'\x00\x00\x00'
   >>> bytearray(b'\x00') * 3
   bytearray(b'\x00\x00\x00')

However, this was also the case when the ``bytearray`` type was originally
designed, and the decision was made to add explicit support for it in the
type constructor. The immutable ``bytes`` type then inherited that feature
when it was introduced in :pep:`3137`.

This PEP isn't revisiting that original design decision, just changing the
spelling as users sometimes find the current behavior of the binary sequence
constructors surprising. In particular, there's a reasonable case to be made
that ``bytes(x)`` (where ``x`` is an integer) should behave like the
``bytes.fromint(x)`` proposal in this PEP. Providing both behaviors as separate
class methods avoids that ambiguity.

Current Workarounds
-------------------

After nearly a decade, there's seems to be no consensus on the best workarounds
for byte iteration, as demonstrated by
`Get single-byte bytes objects from bytes objects`_.


Omitting the originally proposed builtin function
-------------------------------------------------

When submitted to the Steering Council, this PEP proposed the introduction of
a ``bchr`` builtin (with the same behaviour as ``bytes.fromint``), recreating
the ``ord``/``chr``/``unichr`` trio from Python 2 under a different naming
scheme (``ord``/``bchr``/``chr``).

The SC indicated they didn't think this functionality was needed often enough
to justify offering two ways of doing the same thing, especially when one of
those ways was a new builtin function. That part of the proposal was therefore
dropped as being redundant with the ``bytes.fromint`` alternate constructor.

Developers that use this method frequently will instead have the option to
define their own ``bchr = bytes.fromint`` aliases.


Scope limitation: memoryview
----------------------------

Updating ``memoryview`` with the new item retrieval methods is outside the scope
of this PEP.


References
==========

* `Initial March 2014 discussion thread on python-ideas <https://mail.python.org/pipermail/python-ideas/2014-March/027295.html>`_
* `Guido's initial feedback in that thread <https://mail.python.org/pipermail/python-ideas/2014-March/027376.html>`_
* `Issue proposing moving zero-initialised sequences to a dedicated API <https://github.com/python/cpython/issues/65094>`_
* `Issue proposing to use calloc() for zero-initialised binary sequences <https://github.com/python/cpython/issues/65843>`_
* `August 2014 discussion thread on python-dev <https://mail.python.org/pipermail/python-ideas/2014-March/027295.html>`_
* `June 2016 discussion thread on python-dev <https://mail.python.org/pipermail/python-dev/2016-June/144875.html>`_
* `Get single-byte bytes objects from bytes objects <https://discuss.python.org/t/get-single-byte-bytes-objects-from-a-bytes-object/41709>`_

Copyright
=========

This document has been placed in the public domain.