Make bytearray automatically compatible with bytes? #552

JelleZijlstra · 2018-04-07T02:17:14Z

mypy automatically promotes bytearray to bytes, which means bytearray objects are accepted for arguments declared as bytes. The motivation in mypy's code is that convenience trumps safety (most function that accept bytes really do accept bytearray) even though the promotion is technically unsafe (https://github.com/python/mypy/blob/41641a021b17cebe5cbd016fb3e5dc1eb552adb8/mypy/semanal.py#L127).

Should we specify in PEP 484 that type checkers should perform this promotion? We already put something similar for float/int (https://www.python.org/dev/peps/pep-0484/#the-numeric-tower) and for int/long and str/unicode in Python 2. If people agree that this promotion should also be standardized, I can make a PR to update the PEP.

I came across this because I realized that I've been relying on the bytearray/bytes promotion in typeshed, but typeshed technically shouldn't rely on this since it's a mypy extension to the standard.

gvanrossum · 2018-04-07T02:37:10Z

SGTM

JukkaL · 2018-04-09T09:35:28Z

Sounds reasonable.

For completeness, here's some reasoning behind the design decision in mypy. The promotion was implemented due to a combination of factors:

Hardly no APIs document whether functions accept just bytes or both bytes and bytearray objects. To precisely annotate functions in typeshed without the promotion, pretty much every bytes argument type would have to be manually checked against bytearray as well, which would have been a lot of work.
bytearray is used relatively rarely. This means that the motivation to perform the analysis mentioned in step (1) hardly exists. Also, this means that the added practical unsafety caused by the promotion will be small.
Operations involving a mix of bytearray and bytes such as concatenation can result in bytearray objects. Thus it's possible that some function that accepts both bytes and bytearray can also return both of them. A pretty complicated overloaded signature may be required to precisely model this, which seems pointless because of (2) above.

There is at least one somewhat common idiom where the promotion can cause trouble: AnyStr. Here's an example:

def  question(s: AnyStr) -> AnyStr:
    if isinstance(s, bytes):
        return s + b'?'
    else:
        return s + '?'

question(bytearray(b'foo'))  # Error

This can often be worked around by using isinstance(.., str) instead.

gvanrossum · 2018-04-09T18:56:22Z

Perhaps we can also pretend memoryview is a virtual subclass of bytes? It just came up in #4871.

zackw · 2019-09-27T16:08:34Z

Here is a concrete case where it's a problem for memoryview not to be acceptable as an argument to functions that take bytearray or bytes arguments. This is part of a program that converts archived files from .gz and/or .bz2 to .xz format.

import bz2, gzip, lzma, os
from typing import Union

def do_recompression(wfd: int, rfd: int, ext: str) -> None:
    """Decompress data from RFP (which may be in either .gz or .bz2
       format) and write recompressed data to WFP.  If this process
       succeeds, change the file timestamps on WFP to match RFP and
       sync WFP to persistent storage.
    """

    rfp = os.fdopen(rfd, mode="rb", closefd=False)
    rd: Union[gzip.GzipFile, bz2.BZ2File]
    if ext == '.gz':
        rd = gzip.GzipFile(fileobj=rfp, mode="rb")
    elif ext == '.bz2':
        rd = bz2.BZ2File(rfp, mode="rb")
    else:
        # Caller should have ensured that one of the above two cases
        # is true.
        raise RuntimeError("can't get here")

    wfp = os.fdopen(wfd, mode="wb", closefd=False)
    wr = lzma.LZMAFile(wfp, mode="wb", format=lzma.FORMAT_XZ,
                       check=lzma.CHECK_SHA256, preset=9)

    # work in 16 megabyte chunks
    # wrapping the bytearray in a memoryview allows us to slice
    # it without copying
    block = memoryview(bytearray(16*1024*1024))

    # with-block ensures rd and wr are flushed and closed
    # before their file descriptors are
    with rd, wr:
        while True:
            nread = rd.readinto(block)
            if nread == 0:
                break
            wr.write(block[:nread])

    st = os.stat(rfd)
    os.utime(wfd, ns=(st.st_atime_ns, st.st_mtime_ns))
    os.fsync(wfd)

mypy objects to both reading and writing into the memoryview, even though this is perfectly acceptable (and documented to work). If I didn't have the memoryview, the block[:nread] construct would copy potentially megabytes of data.

$ mypy recompress_tree.py
recompress_tree.py:109: error: Argument 1 to "readinto" of "BufferedIOBase"
    has incompatible type "memoryview"; expected "Union[bytearray, mmap]"
recompress_tree.py:112: error: Argument 1 to "write" of "LZMAFile"
    has incompatible type "memoryview"; expected "bytes"

$ mypy --version
mypy 0.720

hauntsaninja · 2020-08-31T20:11:43Z

What is the status here? https://docs.python.org/3/library/typing.html#typing.ByteString says:

This type represents the types bytes, bytearray, and memoryview of byte sequences.
As a shorthand for this type, bytes can be used to annotate arguments of any of the types mentioned above.

But I see no language in PEP 484. Is there consensus and we just need to update the PEP?

(This came up in python/typeshed#4500 (comment))

srittau · 2020-08-31T20:17:05Z

FWIW, typeshed has handled it as described in the docs for as long as I can remember.

srittau · 2021-11-04T11:40:37Z

Closing this here. This is at least documented in the typing docs (see above) and this handled correctly by type checkers AFAIK. If there are still issues open surrounding this, please open issues in the corresponding issue trackers.

zackw · 2021-11-04T14:24:33Z

mypy 0.910 accepts the test program I posted back in 2019, so I'm satisfied this is resolved.

JelleZijlstra changed the title ~~Make bytearray automatically compatible with bytes~~ Make bytearray automatically compatible with bytes? Apr 7, 2018

gvanrossum mentioned this issue Apr 9, 2018

Unexpected type errors with bytes and typing.ByteString python/mypy#4871

Closed

srittau mentioned this issue Nov 6, 2018

Allow bytearray as a valid input to base64 for both encode and decode. python/typeshed#2587

Merged

jolaf mentioned this issue Aug 7, 2019

bytearray is not treated as compatible with bytes agronholm/typeguard#72

Closed

andrewkozlik mentioned this issue Mar 5, 2020

Ed25519 in FIDO2 trezor/trezor-firmware#887

Merged

hauntsaninja mentioned this issue Nov 25, 2020

Confusing error: incompatible type "Type[bytes]"; expected "Type[bytes]" python/mypy#9756

Open

hauntsaninja mentioned this issue Jan 17, 2021

Some overrides for memoryview.__setitem__ are incorrect python/typeshed#4940

Closed

srittau closed this as completed Nov 4, 2021

njsmith mentioned this issue Jul 12, 2023

BytesMessage.data is a bytearray but type-hinted as bytes python-hyper/wsproto#185

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make bytearray automatically compatible with bytes? #552

Make bytearray automatically compatible with bytes? #552

JelleZijlstra commented Apr 7, 2018

gvanrossum commented Apr 7, 2018 via email

JukkaL commented Apr 9, 2018

gvanrossum commented Apr 9, 2018

zackw commented Sep 27, 2019 •

edited

Loading

hauntsaninja commented Aug 31, 2020

srittau commented Aug 31, 2020

srittau commented Nov 4, 2021

zackw commented Nov 4, 2021

Make bytearray automatically compatible with bytes? #552

Make bytearray automatically compatible with bytes? #552

Comments

JelleZijlstra commented Apr 7, 2018

gvanrossum commented Apr 7, 2018 via email

JukkaL commented Apr 9, 2018

gvanrossum commented Apr 9, 2018

zackw commented Sep 27, 2019 • edited Loading

hauntsaninja commented Aug 31, 2020

srittau commented Aug 31, 2020

srittau commented Nov 4, 2021

zackw commented Nov 4, 2021

zackw commented Sep 27, 2019 •

edited

Loading