Notebook

Note: Click on "Kernel" > "Restart Kernel and Clear All Outputs" in JupyterLab before reading this notebook to reset its output. If you cannot run this file on your machine, you may want to open it in the cloud .

Chapter 11: Classes & Instances¶

In contrast to all the built-in data types introduced in the previous chapters, classes allow us to create user-defined data types. They enable us to model data and its associated behavior in an abstract way. Concrete instances of these custom data types then encapsulate the state in a running program. Often, classes are blueprints modeling "real world things."

Classes and instances follow the object-oriented programming (OOP) paradigm where a large program is broken down into many small components (i.e., the objects) that reuse code. This way, a program that is too big for a programmer to fully comprehend as a whole becomes maintainable via its easier to understand individual pieces.

Often, we see the terminology "classes & objects" used instead of "classes & instances" in Python related texts. In this book, we are more precise as both classes and instances are objects as specified already in the "Objects vs. Types vs. Values" section in Chapter 1 .

Example: Vectors & Matrices¶

Neither core Python nor the standard library offer an implementation of common linear algebra functionalities. While we introduce the popular third-party library numpy in Chapter 10 as the de-facto standard for that and recommend to use it in real-life projects, we show how one could use Python's object-oriented language features to implement common matrix and vector operations throughout this chapter. Once we have achieved that, we compare our own library with numpy.

Without classes, we could model a vector, for example, with a tuple or a list object, depending on if we want it to be mutable or not.

Let's take the following vector $\vec{x}$ as an example and model it as a tuple:

$\vec{x} = \begin{pmatrix} 1 \\ 2 \\ 3 \end{pmatrix}$

In [1]:

x = (1, 2, 3)

In [2]:

Out[2]:

(1, 2, 3)

We can extend this approach and model a matrix as either a tuple holding other tuples or as a list holding other lists or as a mixture of both. Then, we must decide if the inner objects represent rows or columns. A common convention is to go with the former.

For example, let's model the matrix $\bf{A}$ below as a list of row lists:

$\bf{A} = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix}$

In [3]:

A = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

In [4]:

Out[4]:

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

While this way of representing vectors and matrices in memory keeps things simple, we cannot work with them easily as Python does not know about the semantics (i.e., "rules") of vectors and matrices modeled as tuples and lists of lists.

For example, we should be able to multiply $\bf{A}$ with $\vec{x}$ if their dimensions match. However, Python does not know how to do this and raises a TypeError.

$\bf{A} * \vec{x} = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} * \begin{pmatrix} 1 \\ 2 \\ 3 \end{pmatrix} = \begin{pmatrix} 14 \\ 32 \\ 50 \end{pmatrix}$

In [5]:

A * x

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[5], line 1
----> 1 A * x

TypeError: can't multiply sequence by non-int of type 'tuple'

Throughout this chapter, we "teach" Python the rules of linear algebra .

Class Definition¶

The compound class statement creates a new variable that references a class object in memory.

Following the header line, the indented body syntactically consists of function definitions (i.e., .dummy_method()) and variable assignments (i.e., .dummy_variable). Any code put here is executed just as if it were outside the class statement. However, the class object acts as a namespace, meaning that all the names do not exist in the global scope but may only be accessed with the dot operator . on the class object. In this context, the names are called class attributes.

Within classes, functions are referred to as methods that are bound to future instance objects. This binding process means that Python implicitly inserts a reference to a concrete instance object as the first argument to any method invocation (i.e., "function call"). By convention, we call this parameter self as it references the instance object on which the method is invoked. Then, as the method is executed, we can set and access attributes via the dot operator . on self. That is how we manage the state of a concrete instance within a generically written class. At the same time, the code within a method is reused whenever we invoke a method on any instance.

As indicated by PEP 257 and also section 3.8.4 of the Google Python Style Guide , we use docstrings to document relevant parts of the new data type. With respect to naming, classes are named according to the CamelCase convention while instances are treated like normal variables and named in snake_case .

In [6]:

class Vector:
    """A one-dimensional vector from linear algebra."""

    dummy_variable = "I am a vector"

    def dummy_method(self):
        """A dummy method for illustration purposes."""
        return self.dummy_variable

Vector is an object on its own with an identity, a type, and a value.

In [7]:

id(Vector)

Out[7]:

94113690586816

Its type is type indicating that it represents a user-defined data type and it evaluates to its fully qualified name (i.e., __main__ as it is defined in this Jupyter notebook).

We have seen the type type before in the "Constructors" section in Chapter 2 and also in the "The namedtuple Type" section in Chapter 7's Appendix . In the latter case, we could also use a Point class but the namedtuple() function from the collections module in the standard library is a convenient shortcut to create custom data types that can be derived out of a plain tuple.

In all examples, if an object's type is type, we can simply view it as a blueprint for a "family" of objects.

In [8]:

type(Vector)

Out[8]:

type

In [9]:

Vector

Out[9]:

__main__.Vector

The docstrings are transformed into convenient help texts.

In [10]:

Vector?

Init signature: Vector()
Docstring:      A one-dimensional vector from linear algebra.
Type:           type
Subclasses:

In [11]:

help(Vector)

Help on class Vector in module __main__:

class Vector(builtins.object)
 |  A one-dimensional vector from linear algebra.
 |
 |  Methods defined here:
 |
 |  dummy_method(self)
 |      A dummy method for illustration purposes.
 |
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |
 |  __dict__
 |      dictionary for instance variables
 |
 |  __weakref__
 |      list of weak references to the object
 |
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |
 |  dummy_variable = 'I am a vector'

We can use the built-in vars() function as an alternative to dir() to obtain a brief summary of the attributes on Vector. Whereas vars() returns a read-only dict-like overview on mostly the explicitly defined attributes, dir() also shows all implicitly added attributes in a list.

In [12]:

vars(Vector)

Out[12]:

mappingproxy({'__module__': '__main__',
              '__doc__': 'A one-dimensional vector from linear algebra.',
              'dummy_variable': 'I am a vector',
              'dummy_method': <function __main__.Vector.dummy_method(self)>,
              '__dict__': <attribute '__dict__' of 'Vector' objects>,
              '__weakref__': <attribute '__weakref__' of 'Vector' objects>})

In [13]:

dir(Vector)

Out[13]:

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'dummy_method',
 'dummy_variable']

With the dot operator . we access the class attributes.

In [14]:

Vector.dummy_variable

Out[14]:

'I am a vector'

In [15]:

Vector.dummy_method

Out[15]:

<function __main__.Vector.dummy_method(self)>

However, invoking the .dummy_method() raises a TypeError. That makes sense as the method expects a concrete instance passed in as the self argument. However, we have not yet created one.

In [16]:

Vector.dummy_method()

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[16], line 1
----> 1 Vector.dummy_method()

TypeError: Vector.dummy_method() missing 1 required positional argument: 'self'

Instantiation¶

To create a new instance, we need to instantiate one.

In the class statement, we see a .__init__() method that contains all the validation logic that we require a Vector instance to adhere to. In a way, this method serves as a constructor-like function.

.__init__() is an example of a so-called special method that we use to make new data types integrate with Python's language features. Their naming follows the dunder convention. In this chapter, we introduce some of the more common special methods, and we refer to the language reference for an exhaustive list of all special methods. Special methods not explicitly defined in a class are implicitly added with a default implementation.

The .__init__() method (cf., reference ) is responsible for initializing a new instance object immediately after its creation. That usually means setting up some instance attributes. In the example, a new Vector instance is created from some sequence object (e.g., a tuple like x) passed in as the data argument. The elements provided by the data argument are first cast as float objects and then stored in a list object named ._entries on the new instance object. Together, the floats represent the state encapsulated within an instance.

A best practice is to separate the way we use a data type (i.e., its "behavior") from how we implement it. By convention, attributes that should not be accessed from "outside" of an instance start with one leading underscore _. In the example, the instance attribute ._entries is such an implementation detail: We could have decided to store a Vector's entries in a tuple instead of a list. However, this decision should not affect how a Vector instance is to be used. Moreover, if we changed how the ._entries are modeled later on, this must not break any existing code using Vectors. This idea is also known as information hiding in software engineering.

In [17]:

class Vector:
    """A one-dimensional vector from linear algebra.

    All entries are converted to floats.
    """

    def __init__(self, data):
        """Create a new vector.

        Args:
            data (sequence): the vector's entries

        Raises:
            ValueError: if no entries are provided
        """
        self._entries = list(float(x) for x in data)
        if len(self._entries) == 0:
            raise ValueError("a vector must have at least one entry")

To create a Vector instance, we call the Vector class with the () operator. This call is forwarded to the .__init__() method behind the scenes. That is what we mean by saying "make new data types integrate with Python's language features" above: We use Vector just as any other built-in constructor.

In [18]:

v = Vector([1, 2, 3])

v is an object as well.

In [19]:

id(v)

Out[19]:

140306181183328

Unsurprisingly, the type of v is Vector. That is the main point of this chapter.

In [20]:

type(v)

Out[20]:

__main__.Vector

v's semantic "value" is not so clear yet. We fix this in the next section.

In [21]:

Out[21]:

<__main__.Vector at 0x7f9b9416d760>

Although the .__init__() method defines two parameters, we must call it with only one data argument. As noted above, Python implicitly inserts a reference to the newly created instance object (i.e., v) as the first argument as self.

Calling a class object with a wrong number of arguments leads to generic TypeErrors ...

In [22]:

Vector()

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[22], line 1
----> 1 Vector()

TypeError: Vector.__init__() missing 1 required positional argument: 'data'

In [23]:

Vector(1, 2, 3)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[23], line 1
----> 1 Vector(1, 2, 3)

TypeError: Vector.__init__() takes 2 positional arguments but 4 were given

... while creating a Vector instance from an empty sequence raises a custom ValueError.

In [24]:

Vector([])

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[24], line 1
----> 1 Vector([])

Cell In[17], line 18, in Vector.__init__(self, data)
     16 self._entries = list(float(x) for x in data)
     17 if len(self._entries) == 0:
---> 18     raise ValueError("a vector must have at least one entry")

ValueError: a vector must have at least one entry

Even though we can access the ._entries attribute on the v object (i.e., "from outside"), we are not supposed to do that because of the underscore _ convention. In other words, we should access ._entries only from within a method via self.

In [25]:

v._entries  # by convention not allowed

Out[25]:

[1.0, 2.0, 3.0]

Text Representations¶

For all the built-in data types, an object's value is represented in a literal notation, implying that we can simply copy and paste the value into another code cell to create a new object with the same value.

The exact representation of the value does not have to be identical to the one used to create the object. For example, we can create a tuple object without using parentheses and Python still outputs its value with ( and ). That was an arbitrary design decision by the core development team.

In [26]:

x = 1, 2, 3

In [27]:

Out[27]:

(1, 2, 3)

In [28]:

(1, 2, 3)

Out[28]:

(1, 2, 3)

To control how objects of a user-defined data type are represented as text, we implement the .__repr__() (cf., reference ) and .__str__() (cf., reference ) methods. Both take only a self argument and must return a str object.

In [29]:

class Vector:

    def __init__(self, data):
        self._entries = list(float(x) for x in data)
        # ...

    def __repr__(self):
        args = ", ".join(repr(x) for x in self._entries)
        return f"Vector(({args}))"

    def __str__(self):
        first, last = self._entries[0], self._entries[-1]
        n_entries = len(self._entries)
        return f"Vector({first!r}, ..., {last!r})[{n_entries:d}]"

Now, when v is evaluated in a code cell, we see the return value of the .__repr__() method.

According to the specification, .__repr__() should return a str object that, when used as a literal, creates a new instance with the same state (i.e., their ._entries attributes compare equal) as the original one. In other words, it should return a text representation of the object optimized for direct consumption by the Python interpreter. That is often useful when debugging or logging large applications.

Our implementation of .__repr__() in the Vector class uses to a tuple notation for the data argument. So, even if we create v from a list object like [1, 2, 3] and even though the _entries are stored as a list object internally, a Vector instance's text representation "defaults" to (( and )) in the output. This decision is arbitrary and we could have used a list notation for the data argument as well.

In [30]:

v = Vector([1, 2, 3])

In [31]:

Out[31]:

Vector((1.0, 2.0, 3.0))

If we copy and paste the value of the v object into another code cell, we create a new Vector instance with the same state as v.

In [32]:

Vector((1.0, 2.0, 3.0))

Out[32]:

Vector((1.0, 2.0, 3.0))

Alternatively, the built-in repr() function returns an object's value as a str object (i.e., with the quotes ').

In [33]:

repr(v)

Out[33]:

'Vector((1.0, 2.0, 3.0))'

On the contrary, the .__str__() method should return a human-readable text representation of the object, and we use the built-in str() and print() functions to obtain this representation explicitly.

For our Vector class, this representation only shows a Vector's first and last entries followed by the total number of entries in brackets. So, even for a Vector containing millions of entries, we could easily make sense of the representation.

While str() returns the text representation as a str object, ...

In [34]:

str(v)

Out[34]:

'Vector(1.0, ..., 3.0)[3]'

... print() does not show the enclosing quotes.

In [35]:

print(v)

Vector(1.0, ..., 3.0)[3]

From a theoretical point of view, the text representation provided by .__repr__() contains all the information (i.e., the $0$ s and $1$ s in memory) that is needed to model something in a computer. In a way, it is a natural extension from the binary (cf., Chapter 5 ), hexadecimal (cf., Chapter 5 ), and bytes (cf., Chapter 6 ) representations of information. After all, just like Unicode characters are encoded in bytes, the more "complex" objects in this chapter are encoded in Unicode characters via their text representations.

The `Matrix` Class¶

Below is a first implementation of the Matrix class that stores the ._entries internally as a list of lists.

The .__init__() method ensures that all the rows come with the same number of columns. Again, we do not allow Matrix instances without any entries.

In [36]:

class Matrix:

    def __init__(self, data):
        self._entries = list(list(float(x) for x in r) for r in data)
        for row in self._entries[1:]:
            if len(row) != len(self._entries[0]):
                raise ValueError("rows must have the same number of entries")
        if len(self._entries) == 0:
            raise ValueError("a matrix must have at least one entry")

    def __repr__(self):
        args = ", ".join("(" + ", ".join(repr(c) for c in r) + ",)" for r in self._entries)
        return f"Matrix(({args}))"

    def __str__(self):
        first, last = self._entries[0][0], self._entries[-1][-1]
        m, n = len(self._entries), len(self._entries[0])
        return f"Matrix(({first!r}, ...), ..., (..., {last!r}))[{m:d}x{n:d}]"

Matrix is an object as well.

In [37]:

id(Matrix)

Out[37]:

94113690738160

In [38]:

type(Matrix)

Out[38]:

type

In [39]:

Matrix

Out[39]:

__main__.Matrix

Let's create a new Matrix instance from a list of tuples.

In [40]:

m = Matrix([(1, 2, 3), (4, 5, 6), (7, 8, 9)])

In [41]:

id(m)

Out[41]:

140306180401856

In [42]:

type(m)

Out[42]:

__main__.Matrix

The text representations work as above.

In [43]:

Out[43]:

Matrix(((1.0, 2.0, 3.0,), (4.0, 5.0, 6.0,), (7.0, 8.0, 9.0,)))

In [44]:

print(m)

Matrix((1.0, ...), ..., (..., 9.0))[3x3]

Passing an invalid data argument when instantiating a Matrix results in the documented exceptions.

In [45]:

Matrix(())

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[45], line 1
----> 1 Matrix(())

Cell In[36], line 9, in Matrix.__init__(self, data)
      7         raise ValueError("rows must have the same number of entries")
      8 if len(self._entries) == 0:
----> 9     raise ValueError("a matrix must have at least one entry")

ValueError: a matrix must have at least one entry

In [46]:

Matrix([(1, 2, 3), (4, 5)])

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[46], line 1
----> 1 Matrix([(1, 2, 3), (4, 5)])

Cell In[36], line 7, in Matrix.__init__(self, data)
      5 for row in self._entries[1:]:
      6     if len(row) != len(self._entries[0]):
----> 7         raise ValueError("rows must have the same number of entries")
      8 if len(self._entries) == 0:
      9     raise ValueError("a matrix must have at least one entry")

ValueError: rows must have the same number of entries

Instance Methods vs. Class Methods¶

The methods we have seen so far are all instance methods. The characteristic idea behind an instance method is that the behavior it provides either depends on the state of a concrete instance or mutates it. In other words, an instance method always works with attributes on the self argument. If a method does not need access to self to do its job, it is conceptually not an instance method and we should probably convert it into another kind of method as shown below.

An example of an instance method from linear algebra is the .transpose() method below that switches the rows and columns of an existing Matrix instance and returns a new Matrix instance based off that. It is implemented by passing the iterator created with the zip() built-in as the data argument to the Matrix constructor: The expression zip(*self._entries) may be a bit hard to understand because of the involved unpacking but simply flips a Matrix's rows and columns. The built-in list() constructor within the .__init__() method then materializes the iterator into the ._entries attribute. Without a concrete Matrix's rows and columns, .transpose() does not make sense, conceptually speaking.

Also, we see that it is ok to reference a class from within one of its methods. While this seems trivial to some readers, others may find this confusing. The final versions of the Vector and Matrix classes in the fourth part of this chapter show how this "hard coded" redundancy can be avoided.

In [47]:

class Matrix:

    def __init__(self, data):
        self._entries = list(list(float(x) for x in r) for r in data)
        # ...

    def __repr__(self):
        args = ", ".join("(" + ", ".join(repr(c) for c in r) + ",)" for r in self._entries)
        return f"Matrix(({args}))"

    def transpose(self):
        return Matrix(zip(*self._entries))

The .transpose() method returns a new Matrix instance where the rows and columns are flipped.

In [48]:

m = Matrix([(1, 2, 3), (4, 5, 6), (7, 8, 9)])

In [49]:

Out[49]:

Matrix(((1.0, 2.0, 3.0,), (4.0, 5.0, 6.0,), (7.0, 8.0, 9.0,)))

In [50]:

m.transpose()

Out[50]:

Matrix(((1.0, 4.0, 7.0,), (2.0, 5.0, 8.0,), (3.0, 6.0, 9.0,)))

Two invocations of .transpose() may be chained, which negates its overall effect but still creates a new instance object (i.e., m is n is False).

In [51]:

n = m.transpose().transpose()

In [52]:

Out[52]:

Matrix(((1.0, 2.0, 3.0,), (4.0, 5.0, 6.0,), (7.0, 8.0, 9.0,)))

In [53]:

m is n

Out[53]:

False

Unintuitively, the comparison operator == returns a wrong result as m and n have _entries attributes that compare equal. We fix this in the "Operator Overloading" section later in this chapter.

In [54]:

m == n

Out[54]:

False

Sometimes, it is useful to attach functionality to a class object that does not depend on the state of a concrete instance but on the class as a whole. Such methods are called class methods and can be created with the classmethod() built-in combined with the @ decorator syntax. Then, Python adapts the binding process described above such that it implicitly inserts a reference to the class object itself instead of the instance when the method is invoked. By convention, we name this parameter cls.

Class methods are often used to provide an alternative way to create instances, usually from a different kind of arguments. As an example, .from_columns() expects a sequence of columns instead of rows as its data argument. It forwards the invocation to the .__init__() method (i.e., what cls(data) does; cls references the same class object as Matrix), then calls the .transpose() method on the newly created instance, and lastly returns the instance created by .transpose(). Again, we are intelligently reusing a lot of code.

In [55]:

class Matrix:

    def __init__(self, data):
        self._entries = list(list(float(x) for x in r) for r in data)
        # ...

    def __repr__(self):
        args = ", ".join("(" + ", ".join(repr(c) for c in r) + ",)" for r in self._entries)
        return f"Matrix(({args}))"

    def transpose(self):
        return Matrix(zip(*self._entries))

    @classmethod
    def from_columns(cls, data):
        return cls(data).transpose()

We use the alternative .from_columns() constructor to create a Matrix equivalent to m above from a list of columns instead of rows.

In [56]:

m = Matrix.from_columns([(1, 4, 7), (2, 5, 8), (3, 6, 9)])

In [57]:

Out[57]:

Matrix(((1.0, 2.0, 3.0,), (4.0, 5.0, 6.0,), (7.0, 8.0, 9.0,)))

There is also a staticmethod() built-in to be used with the @ syntax to define methods that are independent from both the class and instance objects but nevertheless related semantically to a class. In this case, the binding process is disabled an no argument is implicitly inserted upon a method's invocation. Such static methods are not really needed most of the time and we omit them here fore brevity.

Computed Properties¶

After creation, a Matrix instance exhibits certain properties that depend only on the concrete data encapsulated in it. For example, every Matrix instance implicitly has two dimensions: These are commonly denoted as $m$ and $n$ in math and represent the number of rows and columns.

We would like our Matrix instances to have two attributes, .n_rows and .n_cols, that provide the correct dimensions as int objects. To achieve that, we implement two instance methods, .n_rows() and .n_cols(), and make them derived attributes by decorating them with the property() built-in. They work like methods except that they do not need to be invoked with the call operator () but can be accessed as if they were instance variables.

To reuse their code, we integrate the new properties already within the .__init__() method.

In [58]:

class Matrix:

    def __init__(self, data):
        self._entries = list(list(float(x) for x in r) for r in data)
        for row in self._entries[1:]:
            if len(row) != self.n_cols:
                raise ValueError("rows must have the same number of entries")
        if self.n_rows == 0:
            raise ValueError("a matrix must have at least one entry")

    def __repr__(self):
        args = ", ".join("(" + ", ".join(repr(c) for c in r) + ",)" for r in self._entries)
        return f"Matrix(({args}))"

    @property
    def n_rows(self):
        return len(self._entries)

    @property
    def n_cols(self):
        return len(self._entries[0])

The revised m models a $2 \times 3$ matrix.

In [59]:

m = Matrix([(1, 2, 3), (4, 5, 6)])

In [60]:

m.n_rows, m.n_cols

Out[60]:

(2, 3)

In its basic form, properties are read-only attributes. This makes sense for Matrix instances where we can not "set" how many rows and columns there are while keeping the _entries unchanged.

In [61]:

m.n_rows = 3

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[61], line 1
----> 1 m.n_rows = 3

AttributeError: property 'n_rows' of 'Matrix' object has no setter