Python Inconsistencies and Eccentricities

This is a collection of eccentricities and inconsistencies I have run into in Python, mostly in the course of writing Hypothesis.

Most of them aren't terrible and are merely weird or annoying, and most languages have warts like this. Python feels like it has an atypically high rate of them but maybe I've just looked a lot more closely.

All of these examples are Python 2, with notes where they no longer apply in Python 3. Python 3 has a few new eccentricities of its own but is mostly a bit better in this regard.

These are in no particular order except the one I remembered them in.

Containment and equality

In [1]:
x = float('nan')
x == x
Out[1]:
False

So far so "good". This is the normal IEEE behaviour for floats.

In [2]:
float('nan') in [float('nan')]
Out[2]:
False

Expected behaviour still. There is no value in the list equal to float('nan')!

In [3]:
x in [x]
Out[3]:
True

The defined behaviour of contains for lists and tuples is that x in y if any(x is s or x == s for s in y). This is documented in Python 3 but is true (and contrary to the documentation in Python 2.

Note that you will get slightly different behaviour in pypy because float('nan') is float('nan') is always True in pypy. Also there is a bug in some versions where x in [x] will return False but x in [x, "hi"] will return True.

When x += y is different from x = x + y

In [4]:
x = [1]
y = x
x += [2]
In [5]:
x
Out[5]:
[1, 2]
In [6]:
y
Out[6]:
[1, 2]
In [7]:
x = (1,)
y = x
x += (2,)
In [8]:
x
Out[8]:
(1, 2)
In [9]:
y
Out[9]:
(1,)

+= is supported for everything that has + defined, but you can override __iadd__ in which case x += y is equivalent to x = x.__iadd__(y). For lists, iadd mutates x in place.

This is particularly fun when you put mutable types inside immutable ones:

In [10]:
x = ([],)
In [11]:
x[0] += [1]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-11-bbb20d4e3867> in <module>()
----> 1 x[0] += [1]

TypeError: 'tuple' object does not support item assignment

As you'd hope, we got an error on item assignment to a tuple! But...

In [12]:
x[0]
Out[12]:
[1]

Because the assignment mutated it in place, even though we were not able to update the value in the index it "worked" anyway.

type(x + 1 - 1) != type(x)

In [13]:
import sys
sys.maxint
Out[13]:
9223372036854775807
In [14]:
type(sys.maxint)
Out[14]:
int
In [15]:
(sys.maxint + 1) - 1
Out[15]:
9223372036854775807L
In [16]:
type((sys.maxint + 1) - 1)
Out[16]:
long

This one is of course gone in Python 3 because the int/long distinction is gone.

Non-obvious associativity

In [17]:
0 == 1 is False
Out[17]:
False
In [18]:
(0 == 1) is False
Out[18]:
True
In [19]:
0 == (1 is False)
Out[19]:
True

The above is part of operator chaining, It's interpreted as (0 == 1) and (1 is False)

How do you look up the method resolution order?

In [20]:
object.mro()
Out[20]:
[object]
In [21]:
str.mro()
Out[21]:
[str, basestring, object]
In [22]:
type.mro()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-22-fd092325fdfa> in <module>()
----> 1 type.mro()

TypeError: descriptor 'mro' of 'type' object needs an argument

Oh right. This could be a type. So clearly what we should have done is...

In [23]:
type.mro(type)
Out[23]:
[type, object]
In [24]:
type.mro(object)
Out[24]:
[object]
In [25]:
type.mro(str)
Out[25]:
[str, basestring, object]

Looks good, right?

In [26]:
class Foo(object):
    def foo(self):
        return "Hi"

class AddsFoo(type):
    def mro(self):
        return super(AddsFoo, self).mro() + [Foo]
    
class Bar(object):
    __metaclass__ = AddsFoo
    
Bar().foo()
Out[26]:
'Hi'
In [27]:
Bar.mro()
Out[27]:
[__main__.Bar, object, __main__.Foo]
In [28]:
type.mro(Bar)
Out[28]:
[__main__.Bar, object]
In [29]:
type(Bar).mro(Bar)
Out[29]:
[__main__.Bar, object, __main__.Foo]

It is incorrect to call type.mro(x) because x might have a custom metaclass. The only correct way to look up the mro of an arbitrary python type is type(x).mro(x).

How does repr work?

Here's a problem I had recently. In 2.7 __repr__ must return only ascii strings. It's OK to return a unicode string, but it will be interpreted as ascii.

In [30]:
class CustomRepr(object):
    def __init__(self, rep):
        self.rep = rep

    def __repr__(self):
        return self.rep    
In [31]:
CustomRepr("foo")
Out[31]:
foo
In [32]:
CustomRepr(u"foo")
Out[32]:
foo
In [33]:
CustomRepr(u"☃")
---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
/home/david/.pyenv/versions/2.7.10/lib/python2.7/site-packages/IPython/core/formatters.pyc in __call__(self, obj)
    688                 type_pprinters=self.type_printers,
    689                 deferred_pprinters=self.deferred_printers)
--> 690             printer.pretty(obj)
    691             printer.flush()
    692             return stream.getvalue()

/home/david/.pyenv/versions/2.7.10/lib/python2.7/site-packages/IPython/lib/pretty.pyc in pretty(self, obj)
    407                             if callable(meth):
    408                                 return meth(obj, self, cycle)
--> 409             return _default_pprint(obj, self, cycle)
    410         finally:
    411             self.end_group()

/home/david/.pyenv/versions/2.7.10/lib/python2.7/site-packages/IPython/lib/pretty.pyc in _default_pprint(obj, p, cycle)
    527     if _safe_getattr(klass, '__repr__', None) not in _baseclass_reprs:
    528         # A user-provided repr. Find newlines and replace them with p.break_()
--> 529         _repr_pprint(obj, p, cycle)
    530         return
    531     p.begin_group(1, '<')

/home/david/.pyenv/versions/2.7.10/lib/python2.7/site-packages/IPython/lib/pretty.pyc in _repr_pprint(obj, p, cycle)
    709     """A pprint that just redirects to the normal repr function."""
    710     # Find newlines and replace them with p.break_()
--> 711     output = repr(obj)
    712     for idx,output_line in enumerate(output.splitlines()):
    713         if idx:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2603' in position 0: ordinal not in range(128)

It is annoyingly common to get this wrong. Suppose I wanted to display an object that gets it wrong? I am perfectly able to display the unicode, but repr() won't let me.

In [34]:
print(CustomRepr("☃").__repr__())
In [35]:
print(object().__repr__())
<object object at 0x7fb8c2fe3fa0>
In [36]:
print(object.__repr__())
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-36-07552b3772a0> in <module>()
----> 1 print(object.__repr__())

TypeError: descriptor '__repr__' of 'object' object needs an argument

This has the same resolution as the mro problem.

In [37]:
def safe_repr(x):
    return type(x).__repr__(x)

"But David!" you say. "What if __repr__ is assigned on the instance rather than the class?

In [38]:
class NoCustomRepr(object):
    pass

x = NoCustomRepr()
x.__repr__ = "Hi"
x
Out[38]:
<__main__.NoCustomRepr at 0x7fb8b22f3590>

So far so good.

Except...

This workaround is no longer needed in Python 3, but if it were the behaviour would be the same.

Attributes going missing

In [39]:
class HasAProperty(object):
    @property
    def stuff(self):
        return self.oops_does_not_exist
    
hasattr(HasAProperty, 'stuff')
Out[39]:
True
In [40]:
hasattr(HasAProperty(), 'stuff')
Out[40]:
False

hasattr works by calling the property and seeing if you get an AttributeError. It doesn't care where the AttributeError comes from.

Never override methods of builtin types

In [41]:
class BonusList(list):
    def __iter__(self):
        for x in super(BonusList, self).__iter__():
            yield x
        yield 1
In [42]:
BonusList()
Out[42]:
[1]
In [43]:
list(BonusList())
Out[43]:
[1]
In [44]:
tuple(BonusList())
Out[44]:
()
In [45]:
tuple(iter(BonusList()))
Out[45]:
(1,)

This is the sort of inconsistency you can get from C extensions using the concrete methods rather than the abstract protocol. Because a lot of Python builtins are implemented in C, you can run into this sort of thing.

Basically the only safe thing to do is never override methods of builtins.

This behaviour is the same on Python 3 but correct on pypy because pypy has an actually sensible implementation.

How do scopes of class bodies work?

Class bodies act a lot like normal scopes. For example

In [46]:
class Hello(object):
    x = "world"
    print("Hello %s"  % (x,))
Hello world
In [47]:
class A(object):
    a = 10
    b = range(a)
    c = [x for x in b]
    d = list(x for x in c)
    e = [d[i] for i in range(a)]
    f = list(e[i] for i in range(a))
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-47-ff19e5b0b9d8> in <module>()
----> 1 class A(object):
      2     a = 10
      3     b = range(a)
      4     c = [x for x in b]
      5     d = list(x for x in c)

<ipython-input-47-ff19e5b0b9d8> in A()
      5     d = list(x for x in c)
      6     e = [d[i] for i in range(a)]
----> 7     f = list(e[i] for i in range(a))

<ipython-input-47-ff19e5b0b9d8> in <genexpr>((i,))
      5     d = list(x for x in c)
      6     e = [d[i] for i in range(a)]
----> 7     f = list(e[i] for i in range(a))

NameError: global name 'e' is not defined

For some reason e was not in scope there even though all the previous ones are. This would have worked fine if we were in a method rather than class body:

In [48]:
def A():
    a = 10
    b = range(a)
    c = [x for x in b]
    d = list(x for x in c)
    e = [d[i] for i in range(a)]
    f = list(e[i] for i in range(a))
    
A()

The reason for this is that Python distinguishes between scopes and execution frames. The latter are what are captured by function definitions. After a class body has finished executing, the execution frame is discarded, which is why class level variables are not visible in function definitions. What is in scope in a generator body is complicated: The top level collection being iterated over is received from the scope, but the body of the generator must capture the execution frame.

When are types of two different values equal?

Python is very lax about equality of different numericish types

In [49]:
{0: set([1])} == {False: frozenset([True])} == {0.0: set([1.0])}
Out[49]:
True

Note in particular that a frozenset is equal to a set with the same (or equivalent) contents. This is different from the tuple/list behaviour:

In [50]:
[] == ()
Out[50]:
False

Exceptions holding on to values unexpectedly

In [51]:
import gc
from weakref import WeakKeyDictionary


class Key(object):
    pass


def run(cache, template):
    cache[template] = 1
    # Note: Not raising the ValueError causes this to work. Any exception will
    # cause the same problem though.
    raise ValueError()


def test_gc():
    cache = WeakKeyDictionary()
    # Extracting this whole try/except into its own function and passing in
    # cache as an argument causes this to work.
    try:
        # Note: Inlining run here causes this to work
        run(
            cache, Key()
        )
    except ValueError:
        pass
    gc.collect()

    # The Key() argument went out of scope immediately. When we ran the GC
    # it should definitely have had the weakref to it cleared.
    assert not cache


test_gc()
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-51-03f445bff835> in <module>()
     33 
     34 
---> 35 test_gc()

<ipython-input-51-03f445bff835> in test_gc()
     30     # The Key() argument went out of scope immediately. When we ran the GC
     31     # it should definitely have had the weakref to it cleared.
---> 32     assert not cache
     33 
     34 

AssertionError: 

The reason this doesn't work is that the exception stack trace carries frame objects that reference local variables, so until that gets cleared the keys remain in scope. You can fix this by calling sys.exc_clear() before the gc.collect().

Note that this problem is not present in Python 3: exc_info is automatically cleared when you exit the exception handler.

Partially ordered sets

In [52]:
{1} <= {2}
Out[52]:
False
In [53]:
{1} >= {2}
Out[53]:
False

The ordering relation on sets is defined to be subset inclusion, which is only a partial order rather than a total order. In particular this means that (x < y) is not the same as not (y <= x):

In [54]:
{1} < {2}
Out[54]:
False

Note that this means that sorting lists of sets will not work correctly:

In [55]:
sorted([{-1, 1}, {0}, {1}])
Out[55]:
[{-1, 1}, {0}, {1}]
In [56]:
{1} < {-1, 1}
Out[56]:
True

Because sets are not obeying the contract of ordering, they do not sort correctly and you get an allegedly sorted array with the final element strictly less than the first.