Skip to content
Advertisement

Why do I get a warning when concatenating lists of mixed types in Pycharm?

In Pycharm, the following code produces a warning:

from typing import List

list1: List[int] = [1, 2, 3]
list2: List[str] = ["1", "2", "3"]
list3: List[object] = list1 + list2
#                             ↳ Expected type List[int] (matched generic type List[_T]),
#                               got List[str] instead.

Why? Should I not be concatenating two lists of mixed, hinted types?

Advertisement

Answer

As requested in the comments, here are some reasons why type checkers don’t allow this.

The first reason is somewhat prosaic: the type signature of list.__add__ simply doesn’t allow for anything other then a list containing the same type to be passed in:

_T = TypeVar('_T')

# ...snip...

class list(MutableSequence[_T], Generic[_T]):

    # ...snip...

    def __add__(self, x: List[_T]) -> List[_T]: ...

And Pycharm, which supports PEP 484, uses (in part) data from Typeshed.

It’s possible that we could broaden this type signature in some way (e.g. overload it to also accept a List[_S] and return List[Union[_T, _S]] in that case), but I don’t think anybody’s bothered to investigate the feasibility of this approach: this sort of thing isn’t too useful in practice, makes life harder for people who want strictly homogeneous lists or want to subclass them, and would potentially disrupt a lot of existing code that relies on the current type signature.

This type signature is also probably a reflection of the broader choice made during the initial design of PEP 484 to assume that lists are always homogenous — always contains values of the same type.

The designers of PEP 484 strictly speaking didn’t need to make this choice: they could have required type checkers to special-case interactions with it, like we currently do for tuples. But it’s overall simpler not to do this, I think. (And also arguably better style, but whatever.)


The second reason has to do with a fundamental limitation of the PEP 484 type system: there’s no way to declare that some function or method does not modify state.

Basically, the behavior you want is safe only if lst1.__add__(lst2) is guaranteed to not mutate either operands. But there’s no way of actually guaranteeing this — what if lst1 is some weird list subclass that copies items from lst2 to itself? Then temporarily relaxing lst1‘s type from SomeListSubtype[int] to SomeListSubtype[object] would be unsafe: lst1 would no longer contain only ints after adding/injecting the strings from lst2.

Of course, actually writing such a subclass is also bad practice, but type checkers don’t have the luxury of assuming users will follow best practices if they’re not enforced: type checkers, compilers, and similar tools are fundamentally conservative beasts.


And finally, it’s worth noting that none of these problems are intrinsically insurmountable. There are several things type checker implementers could do, such as:

  1. Tinkering with the type signature of list (and making sure it doesn’t break any existing code)
  2. Introduce some sort of way of declaring that a method is pure — does no mutation. Basically, generalize the ideas behind PEP 591 to also apply to functions. (But this would require writing a PEP, modifying typeshed to use the new typing construct, doing a lot of careful design and implementation work…)
  3. Maybe special-case this interaction when we know for certain the two variables are not subclasses of lists. (But realistically, the number of times we’d know this for certain is pretty limited.)

…and so forth.

But all of these things take time and energy to do: it’s a matter of prioritization. The issue tracker for PyCharm (and mypy, etc) are pretty long, and there’s no shortages of other bugs/feature requests to work through.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement