Skip to content

Make bytearray automatically compatible with bytes? #552

New issue

Have a question about this project? No Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “No Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? No Sign in to your account

Closed
JelleZijlstra opened this issue Apr 7, 2018 · 8 comments
Closed

Make bytearray automatically compatible with bytes? #552

JelleZijlstra opened this issue Apr 7, 2018 · 8 comments

Comments

@JelleZijlstra
Copy link
Member

mypy automatically promotes bytearray to bytes, which means bytearray objects are accepted for arguments declared as bytes. The motivation in mypy's code is that convenience trumps safety (most function that accept bytes really do accept bytearray) even though the promotion is technically unsafe (https://github.com/python/mypy/blob/41641a021b17cebe5cbd016fb3e5dc1eb552adb8/mypy/semanal.py#L127).

Should we specify in PEP 484 that type checkers should perform this promotion? We already put something similar for float/int (https://www.python.org/dev/peps/pep-0484/#the-numeric-tower) and for int/long and str/unicode in Python 2. If people agree that this promotion should also be standardized, I can make a PR to update the PEP.

I came across this because I realized that I've been relying on the bytearray/bytes promotion in typeshed, but typeshed technically shouldn't rely on this since it's a mypy extension to the standard.

@JelleZijlstra JelleZijlstra changed the title Make bytearray automatically compatible with bytes Make bytearray automatically compatible with bytes? Apr 7, 2018
@gvanrossum
Copy link
Member

gvanrossum commented Apr 7, 2018 via email

@JukkaL
Copy link
Contributor

JukkaL commented Apr 9, 2018

Sounds reasonable.

For completeness, here's some reasoning behind the design decision in mypy. The promotion was implemented due to a combination of factors:

  1. Hardly no APIs document whether functions accept just bytes or both bytes and bytearray objects. To precisely annotate functions in typeshed without the promotion, pretty much every bytes argument type would have to be manually checked against bytearray as well, which would have been a lot of work.
  2. bytearray is used relatively rarely. This means that the motivation to perform the analysis mentioned in step (1) hardly exists. Also, this means that the added practical unsafety caused by the promotion will be small.
  3. Operations involving a mix of bytearray and bytes such as concatenation can result in bytearray objects. Thus it's possible that some function that accepts both bytes and bytearray can also return both of them. A pretty complicated overloaded signature may be required to precisely model this, which seems pointless because of (2) above.

There is at least one somewhat common idiom where the promotion can cause trouble: AnyStr. Here's an example:

def  question(s: AnyStr) -> AnyStr:
    if isinstance(s, bytes):
        return s + b'?'
    else:
        return s + '?'

question(bytearray(b'foo'))  # Error

This can often be worked around by using isinstance(.., str) instead.

@gvanrossum
Copy link
Member

Perhaps we can also pretend memoryview is a virtual subclass of bytes? It just came up in #4871.

@zackw
Copy link

zackw commented Sep 27, 2019

Here is a concrete case where it's a problem for memoryview not to be acceptable as an argument to functions that take bytearray or bytes arguments. This is part of a program that converts archived files from .gz and/or .bz2 to .xz format.

import bz2, gzip, lzma, os
from typing import Union

def do_recompression(wfd: int, rfd: int, ext: str) -> None:
    """Decompress data from RFP (which may be in either .gz or .bz2
       format) and write recompressed data to WFP.  If this process
       succeeds, change the file timestamps on WFP to match RFP and
       sync WFP to persistent storage.
    """

    rfp = os.fdopen(rfd, mode="rb", closefd=False)
    rd: Union[gzip.GzipFile, bz2.BZ2File]
    if ext == '.gz':
        rd = gzip.GzipFile(fileobj=rfp, mode="rb")
    elif ext == '.bz2':
        rd = bz2.BZ2File(rfp, mode="rb")
    else:
        # Caller should have ensured that one of the above two cases
        # is true.
        raise RuntimeError("can't get here")

    wfp = os.fdopen(wfd, mode="wb", closefd=False)
    wr = lzma.LZMAFile(wfp, mode="wb", format=lzma.FORMAT_XZ,
                       check=lzma.CHECK_SHA256, preset=9)

    # work in 16 megabyte chunks
    # wrapping the bytearray in a memoryview allows us to slice
    # it without copying
    block = memoryview(bytearray(16*1024*1024))

    # with-block ensures rd and wr are flushed and closed
    # before their file descriptors are
    with rd, wr:
        while True:
            nread = rd.readinto(block)
            if nread == 0:
                break
            wr.write(block[:nread])

    st = os.stat(rfd)
    os.utime(wfd, ns=(st.st_atime_ns, st.st_mtime_ns))
    os.fsync(wfd)

mypy objects to both reading and writing into the memoryview, even though this is perfectly acceptable (and documented to work). If I didn't have the memoryview, the block[:nread] construct would copy potentially megabytes of data.

$ mypy recompress_tree.py
recompress_tree.py:109: error: Argument 1 to "readinto" of "BufferedIOBase"
    has incompatible type "memoryview"; expected "Union[bytearray, mmap]"
recompress_tree.py:112: error: Argument 1 to "write" of "LZMAFile"
    has incompatible type "memoryview"; expected "bytes"

$ mypy --version
mypy 0.720

@hauntsaninja
Copy link
Collaborator

What is the status here? https://docs.python.org/3/library/typing.html#typing.ByteString says:

This type represents the types bytes, bytearray, and memoryview of byte sequences.
As a shorthand for this type, bytes can be used to annotate arguments of any of the types mentioned above.

But I see no language in PEP 484. Is there consensus and we just need to update the PEP?

(This came up in python/typeshed#4500 (comment))

@srittau
Copy link
Collaborator

srittau commented Aug 31, 2020

FWIW, typeshed has handled it as described in the docs for as long as I can remember.

@srittau
Copy link
Collaborator

srittau commented Nov 4, 2021

Closing this here. This is at least documented in the typing docs (see above) and this handled correctly by type checkers AFAIK. If there are still issues open surrounding this, please open issues in the corresponding issue trackers.

@srittau srittau closed this as completed Nov 4, 2021
@zackw
Copy link

zackw commented Nov 4, 2021

mypy 0.910 accepts the test program I posted back in 2019, so I'm satisfied this is resolved.

No Sign up for free to join this conversation on GitHub. Already have an account? No Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants