WTF is this?

or, When is is what you think is is?

Also, rabbit hole alert...

In [ ]:
%%HTML
<img src="https://imgs.xkcd.com/comics/bun_alert.png" width=500></img>

The Problem

In [ ]:
%%HTML
<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">Pay no mind.... <a href="https://t.co/mnIPHJXE1h">pic.twitter.com/mnIPHJXE1h</a></p>&mdash; David Beazley (@dabeaz) <a href="https://twitter.com/dabeaz/status/890634046958477312">July 27, 2017</a></blockquote>
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>
In [ ]:
# let's reproduce it
class A():
    pass
A.__dict__ is A.__dict__
In [ ]:
# ... and more robustly...
a = A()
a.__class__.__dict__ is a.__class__.__dict__

Our path...

The code in question involves class objects and instances, the is operator, and attribute access via the dot notation. Let's explore how those objects and operations work.

Out of scope:

  • we're not gonna talk about properties (by name)
  • we're not gonna talk about descriptors (by name)
  • we're not gonna talk about slots

...but you will run into these concepts if you investigate beyond this tutorial.

1) Python class construction

In [ ]:
class B():
    pass
In [ ]:
C = type('C',(),dict())
In [ ]:
D = type('C',(),dict())
In [ ]:
D

Takeaways:

  • two forms of class definition
  • variables point to objects

Reminder:

  • all objects in python 3 are instances of object, including objects that are class definitions

2) Python class comparision

How do these class definitions compare?

In [ ]:
# Start with the equivalence operator (==)
# --> remember that this will be defined by the ".__eq__()" method of the argument on the left
In [ ]:
B == B
In [ ]:
B == C
In [ ]:
C == D
In [ ]:
B == D
In [ ]:
# check the directory of the object's attributes (more about this later)

vars(B)
In [ ]:
vars(B) == vars(B)
In [ ]:
vars(B) == vars(C)
In [ ]:
vars(C) == vars(D)
In [ ]:
# let's cast it to a real 'dict'
dict(vars(D))
In [ ]:
dict(vars(B)) == dict(vars(C))
In [ ]:
dict(vars(C)) == dict(vars(D))
In [ ]:
# check the directory of attributes (more about this later)

dir(B)
In [ ]:
dir(B) == dir(B)
In [ ]:
dir(B) == dir(C)
In [ ]:
dir(C) == dir(D)

Takeaways:

  • Class definitions are objects with attributes
  • Class descriptions (vars, dir, etc.) are equivalent for self-comparison
  • Only the objects' list of attribute names are equivalent for separately-constructed objects

3) Python class identity

What are these objects?

In [ ]:
# instance and type

isinstance(B,type)
In [ ]:
isinstance(B,object)
In [ ]:
type(B)
In [ ]:
B.__class__
In [ ]:
B.__base__
In [ ]:
B.__bases__
In [ ]:
id(B)
In [ ]:
# the 'is' operator compares the result of the 'id' function's application to the arguments

B is B
In [ ]:
id(B) == id(B)
In [ ]:
# now use B's callability to create an instance of it
b = B()
In [ ]:
isinstance(b,B)
In [ ]:
type(b).__bases__
In [ ]:
# FWIW
type(type)
In [ ]:
type.__bases__

Takeaways:

  • class objects are instances of the type 'type'
  • class objects are classes that inherit from 'object'

WTF?

4) Object attributes

In addition to various notions of identity, we also need to investigate attribute access.

Apart from the problem we're investigating, Python places a lot of importance on interfaces, in which an object is described and classified in terms of its function and attributes, rather than its identity or inheritance properties.

In [ ]:
# set some attributes of some objects
setattr(b,'an_instance_attr',1)
setattr(B,'a_class_attr',2)
setattr(B,'a_class_method',lambda x: 3)
In [ ]:
vars(b)
In [ ]:
b.__dict__
In [ ]:
vars(B)

Conclusion: __dict__ / vars() returns an instance's attributes.

Let iterate through b's inheritance tree, and look at the instance attributes.

In [ ]:
vars(type)
In [ ]:
vars(object)
In [ ]:
# collect all the instance attributes of the inheritance tree (don't include type)

attribute_keys = set( list(vars(b).keys()) + list(vars(B).keys()) + list(vars(object).keys()))
In [ ]:
for attribute_key in attribute_keys:
    print('{} : {}'.format(attribute_key,getattr(b,attribute_key)))
In [ ]:
# our manual attributes collection should match that from 'dir'
attribute_keys - set(dir(b))

NOTE: dir is not always reliable.

Take-aways:

  • The __dict__ attribute lists the instance attributes of an object

5) Instance and class attributes

In [ ]:
b.an_instance_attr
In [ ]:
B.an_instance_attr
In [ ]:
B.a_class_attr
In [ ]:
b.a_class_attr
In [ ]:
b.a_class_method
In [ ]:
b.a_class_method()

Take-aways:

  • instance attributes do not affect the associated class attribute set
  • class attributes are available for lookup by an instance

Out of scope:

  • how do instance attributes get added at construction?

6) Attribute access

The dot notation searches through the attributes of the instance, then the class, the through parent classes, to find an attribute of the requested name.

The method resolution order defines how complex inheritance structures are traversed.

In [ ]:
B.mro()
In [ ]:
# Python's MRO invokes a smart algorithm that accounts for circularity in the inheritance tree
# https://en.wikipedia.org/wiki/C3_linearization

class X():
    a = 1
class Y():
    b = 2
class Z(X,Y):
    c = 3
Z.mro()
In [ ]:
Z.c
In [ ]:
Z.b
In [ ]:
Z.a
In [ ]:
# get an attribute defined only by the base class
Z.__repr__

To locate the attribute named my_attr, Python:

  • searchs the __dict__ attribute of the instance for key my_attr
  • searches the __dict__ attributes of all the objects in the MRO
  • searches in all the places for a __getattr__ method, and calls object.__getattr__('my_attr')
  • ...other things...

Take-aways:

  • the method resolution order manages the order and sources for object attribute lookup
  • attribute lookup is potentially complicated

7) An optimization

Because attribute lookup is common and potentially complicated, the Python authors decided to enforce some simplifications to the process. Most important for our problem here: class-level attributes and methods must by referenced with strings.

In [ ]:
# let's start with the instance-level attribute dictionary

b.__dict__['an_attr'] = 'value'
b.__dict__
In [ ]:
# I don't know why anyone would want to do this, but we'll allow it at the level of instance objects. 
# Any hashable object can be a key in an ordinary dictionary.

b.__dict__[1] = [3,4]
In [ ]:
# what happens if we do the same to `b`'s class?

b.__class__.__dict__[1] = [3,4]
In [ ]:
# right, we've seen this "mappingproxy" before
b.__class__.__dict__
In [ ]:
# also equivalent
B.__dict__

The MappingProxyType type is a read-only view of a mapping (dictionary). So we can't set instance attributes via this attribute. This requires that attributes be set with setattr, which calls __setattr__.

In [ ]:
# turns out, it's a method of 'object'
B.__setattr__
In [ ]:
setattr(B,1,2)

Take-away:

  • class attributes are required to be referenced by strings, due to the implementation of object.__setattr__, thus speeding up attribute lookup.
  • the class-level attribute mapping is returned by a read-only mappingproxy object

8) Tying it together

Now we know why an object's __dict__ attribute returns a read-only mappingproxy object. Let's return to the Tweet and address the question of the mappingproxy object's identity.

In [ ]:
%%HTML
<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">Pay no mind.... <a href="https://t.co/mnIPHJXE1h">pic.twitter.com/mnIPHJXE1h</a></p>&mdash; David Beazley (@dabeaz) <a href="https://twitter.com/dabeaz/status/890634046958477312">July 27, 2017</a></blockquote>
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>
In [ ]:
# the example
A.__dict__ is A.__dict__
In [ ]:
# run this a few times
id(A.__dict__)

Takeaway: a new mappingproxy object is created for every call to __dict__, and since two objects can't share the same memory address at the same time, this form of comparison will never be true. The reason that a new mappingproxy is created for each call to __dict__ is, unfortunately, out of scope.

Bonus questions below:

In [ ]:
# what about this?
id(A.__dict__) == id(A.__dict__)
In [ ]:
# or this?
x = id(A.__dict__)
y = id(A.__dict__)
x == y
In [ ]:
# or this?
x = A.__dict__
y = A.__dict__
id(x) == id(y)

Remember: the return value of the id builtin function "is an integer which is guaranteed to be unique and constant for this object during its lifetime."

In [ ]: