Note: Click on "Kernel" > "Restart Kernel and Clear All Outputs" in JupyterLab before reading this notebook to reset its output. If you cannot run this file on your machine, you may want to open it in the cloud .
In contrast to all the built-in data types introduced in the previous chapters, classes allow us to create user-defined data types. They enable us to model data and its associated behavior in an abstract way. Concrete instances of these custom data types then encapsulate the state in a running program. Often, classes are blueprints modeling "real world things."
Classes and instances follow the object-oriented programming (OOP) paradigm where a large program is broken down into many small components (i.e., the objects) that reuse code. This way, a program that is too big for a programmer to fully comprehend as a whole becomes maintainable via its easier to understand individual pieces.
Often, we see the terminology "classes & objects" used instead of "classes & instances" in Python related texts. In this book, we are more precise as both classes and instances are objects as specified already in the "Objects vs. Types vs. Values" section in Chapter 1 .
Neither core Python nor the standard library offer an implementation of common linear algebra functionalities. While we introduce the popular third-party library numpy in Chapter 10
as the de-facto standard for that and recommend to use it in real-life projects, we show how one could use Python's object-oriented language features to implement common matrix and vector operations throughout this chapter. Once we have achieved that, we compare our own library with numpy.
Without classes, we could model a vector, for example, with a tuple
or a list
object, depending on if we want it to be mutable or not.
Let's take the following vector →x as an example and model it as a tuple
:
→x=(123)
x = (1, 2, 3)
x
(1, 2, 3)
We can extend this approach and model a matrix as either a tuple
holding other tuple
s or as a list
holding other list
s or as a mixture of both. Then, we must decide if the inner objects represent rows or columns. A common convention is to go with the former.
For example, let's model the matrix A below as a list
of row list
s:
A=[123456789]
A = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
A
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
While this way of representing vectors and matrices in memory keeps things simple, we cannot work with them easily as Python does not know about the semantics (i.e., "rules") of vectors and matrices modeled as tuple
s and list
s of list
s.
For example, we should be able to multiply A with →x if their dimensions match. However, Python does not know how to do this and raises a TypeError
.
A∗→x=[123456789]∗(123)=(143250)
A * x
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[5], line 1 ----> 1 A * x TypeError: can't multiply sequence by non-int of type 'tuple'
Throughout this chapter, we "teach" Python the rules of linear algebra .
The compound class
statement creates a new variable that references a class object in memory.
Following the header line, the indented body syntactically consists of function definitions (i.e., .dummy_method()
) and variable assignments (i.e., .dummy_variable
). Any code put here is executed just as if it were outside the class
statement. However, the class object acts as a namespace, meaning that all the names do not exist in the global scope but may only be accessed with the dot operator .
on the class object. In this context, the names are called class attributes.
Within classes, functions are referred to as methods that are bound to future instance objects. This binding process means that Python implicitly inserts a reference to a concrete instance object as the first argument to any method invocation (i.e., "function call"). By convention, we call this parameter self
as it references the instance object on which the method is invoked. Then, as the method is executed, we can set and access attributes via the dot operator .
on self
. That is how we manage the state of a concrete instance within a generically written class. At the same time, the code within a method is reused whenever we invoke a method on any instance.
As indicated by PEP 257 and also section 3.8.4 of the Google Python Style Guide
, we use docstrings to document relevant parts of the new data type. With respect to naming, classes are named according to the CamelCase
convention while instances are treated like normal variables and named in snake_case
.
class Vector:
"""A one-dimensional vector from linear algebra."""
dummy_variable = "I am a vector"
def dummy_method(self):
"""A dummy method for illustration purposes."""
return self.dummy_variable
Vector
is an object on its own with an identity, a type, and a value.
id(Vector)
94113690586816
Its type is type
indicating that it represents a user-defined data type and it evaluates to its fully qualified name (i.e., __main__
as it is defined in this Jupyter notebook).
We have seen the type type
before in the "Constructors" section in Chapter 2 and also in the "The
namedtuple
Type" section in Chapter 7's Appendix . In the latter case, we could also use a
Point
class but the namedtuple() function from the collections
module in the standard library
is a convenient shortcut to create custom data types that can be derived out of a plain
tuple
.
In all examples, if an object's type is type
, we can simply view it as a blueprint for a "family" of objects.
type(Vector)
type
Vector
__main__.Vector
The docstrings are transformed into convenient help texts.
Vector?
Init signature: Vector() Docstring: A one-dimensional vector from linear algebra. Type: type Subclasses:
help(Vector)
Help on class Vector in module __main__: class Vector(builtins.object) | A one-dimensional vector from linear algebra. | | Methods defined here: | | dummy_method(self) | A dummy method for illustration purposes. | | ---------------------------------------------------------------------- | Data descriptors defined here: | | __dict__ | dictionary for instance variables | | __weakref__ | list of weak references to the object | | ---------------------------------------------------------------------- | Data and other attributes defined here: | | dummy_variable = 'I am a vector'
vars(Vector)
mappingproxy({'__module__': '__main__', '__doc__': 'A one-dimensional vector from linear algebra.', 'dummy_variable': 'I am a vector', 'dummy_method': <function __main__.Vector.dummy_method(self)>, '__dict__': <attribute '__dict__' of 'Vector' objects>, '__weakref__': <attribute '__weakref__' of 'Vector' objects>})
dir(Vector)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'dummy_method', 'dummy_variable']
With the dot operator .
we access the class attributes.
Vector.dummy_variable
'I am a vector'
Vector.dummy_method
<function __main__.Vector.dummy_method(self)>
However, invoking the .dummy_method()
raises a TypeError
. That makes sense as the method expects a concrete instance passed in as the self
argument. However, we have not yet created one.
Vector.dummy_method()
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[16], line 1 ----> 1 Vector.dummy_method() TypeError: Vector.dummy_method() missing 1 required positional argument: 'self'
To create a new instance, we need to instantiate one.
In the class
statement, we see a .__init__()
method that contains all the validation logic that we require a Vector
instance to adhere to. In a way, this method serves as a constructor-like function.
.__init__()
is an example of a so-called special method that we use to make new data types integrate with Python's language features. Their naming follows the dunder convention. In this chapter, we introduce some of the more common special methods, and we refer to the language reference for an exhaustive list of all special methods. Special methods not explicitly defined in a class are implicitly added with a default implementation.
The .__init__()
method (cf., reference ) is responsible for initializing a new instance object immediately after its creation. That usually means setting up some instance attributes. In the example, a new
Vector
instance is created from some sequence object (e.g., a tuple
like x
) passed in as the data
argument. The elements provided by the data
argument are first cast as float
objects and then stored in a list
object named ._entries
on the new instance object. Together, the float
s represent the state encapsulated within an instance.
A best practice is to separate the way we use a data type (i.e., its "behavior") from how we implement it. By convention, attributes that should not be accessed from "outside" of an instance start with one leading underscore _
. In the example, the instance attribute ._entries
is such an implementation detail: We could have decided to store a Vector
's entries in a tuple
instead of a list
. However, this decision should not affect how a Vector
instance is to be used. Moreover, if we changed how the ._entries
are modeled later on, this must not break any existing code using Vector
s. This idea is also known as information hiding in software engineering.
class Vector:
"""A one-dimensional vector from linear algebra.
All entries are converted to floats.
"""
def __init__(self, data):
"""Create a new vector.
Args:
data (sequence): the vector's entries
Raises:
ValueError: if no entries are provided
"""
self._entries = list(float(x) for x in data)
if len(self._entries) == 0:
raise ValueError("a vector must have at least one entry")
To create a Vector
instance, we call the Vector
class with the ()
operator. This call is forwarded to the .__init__()
method behind the scenes. That is what we mean by saying "make new data types integrate with Python's language features" above: We use Vector
just as any other built-in constructor.
v = Vector([1, 2, 3])
v
is an object as well.
id(v)
140306181183328
Unsurprisingly, the type of v
is Vector
. That is the main point of this chapter.
type(v)
__main__.Vector
v
's semantic "value" is not so clear yet. We fix this in the next section.
v
<__main__.Vector at 0x7f9b9416d760>
Although the .__init__()
method defines two parameters, we must call it with only one data
argument. As noted above, Python implicitly inserts a reference to the newly created instance object (i.e., v
) as the first argument as self
.
Calling a class object with a wrong number of arguments leads to generic TypeError
s ...
Vector()
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[22], line 1 ----> 1 Vector() TypeError: Vector.__init__() missing 1 required positional argument: 'data'
Vector(1, 2, 3)
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[23], line 1 ----> 1 Vector(1, 2, 3) TypeError: Vector.__init__() takes 2 positional arguments but 4 were given
... while creating a Vector
instance from an empty sequence raises a custom ValueError
.
Vector([])
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[24], line 1 ----> 1 Vector([]) Cell In[17], line 18, in Vector.__init__(self, data) 16 self._entries = list(float(x) for x in data) 17 if len(self._entries) == 0: ---> 18 raise ValueError("a vector must have at least one entry") ValueError: a vector must have at least one entry
Even though we can access the ._entries
attribute on the v
object (i.e., "from outside"), we are not supposed to do that because of the underscore _
convention. In other words, we should access ._entries
only from within a method via self
.
v._entries # by convention not allowed
[1.0, 2.0, 3.0]
For all the built-in data types, an object's value is represented in a literal notation, implying that we can simply copy and paste the value into another code cell to create a new object with the same value.
The exact representation of the value does not have to be identical to the one used to create the object. For example, we can create a tuple
object without using parentheses and Python still outputs its value with (
and )
. That was an arbitrary design decision by the core development team.
x = 1, 2, 3
x
(1, 2, 3)
(1, 2, 3)
(1, 2, 3)
class Vector:
def __init__(self, data):
self._entries = list(float(x) for x in data)
# ...
def __repr__(self):
args = ", ".join(repr(x) for x in self._entries)
return f"Vector(({args}))"
def __str__(self):
first, last = self._entries[0], self._entries[-1]
n_entries = len(self._entries)
return f"Vector({first!r}, ..., {last!r})[{n_entries:d}]"
Now, when v
is evaluated in a code cell, we see the return value of the .__repr__()
method.
According to the specification, .__repr__()
should return a str
object that, when used as a literal, creates a new instance with the same state (i.e., their ._entries
attributes compare equal) as the original one. In other words, it should return a text representation of the object optimized for direct consumption by the Python interpreter. That is often useful when debugging or logging large applications.
Our implementation of .__repr__()
in the Vector
class uses to a tuple
notation for the data
argument. So, even if we create v
from a list
object like [1, 2, 3]
and even though the _entries
are stored as a list
object internally, a Vector
instance's text representation "defaults" to ((
and ))
in the output. This decision is arbitrary and we could have used a list
notation for the data
argument as well.
v = Vector([1, 2, 3])
v
Vector((1.0, 2.0, 3.0))
If we copy and paste the value of the v
object into another code cell, we create a new Vector
instance with the same state as v
.
Vector((1.0, 2.0, 3.0))
Vector((1.0, 2.0, 3.0))
Alternatively, the built-in repr() function returns an object's value as a str
object (i.e., with the quotes '
).
repr(v)
'Vector((1.0, 2.0, 3.0))'
On the contrary, the .__str__()
method should return a human-readable text representation of the object, and we use the built-in str() and print()
functions to obtain this representation explicitly.
For our Vector
class, this representation only shows a Vector
's first and last entries followed by the total number of entries in brackets. So, even for a Vector
containing millions of entries, we could easily make sense of the representation.
While str() returns the text representation as a
str
object, ...
str(v)
'Vector(1.0, ..., 3.0)[3]'
... print() does not show the enclosing quotes.
print(v)
Vector(1.0, ..., 3.0)[3]
From a theoretical point of view, the text representation provided by .__repr__()
contains all the information (i.e., the 0s and 1s in memory) that is needed to model something in a computer. In a way, it is a natural extension from the binary (cf., Chapter 5 ), hexadecimal (cf., Chapter 5
), and
bytes
(cf., Chapter 6 ) representations of information. After all, just like Unicode characters are encoded in
bytes
, the more "complex" objects in this chapter are encoded in Unicode characters via their text representations.
Matrix
Class¶Below is a first implementation of the Matrix
class that stores the ._entries
internally as a list
of list
s.
The .__init__()
method ensures that all the rows come with the same number of columns. Again, we do not allow Matrix
instances without any entries.
class Matrix:
def __init__(self, data):
self._entries = list(list(float(x) for x in r) for r in data)
for row in self._entries[1:]:
if len(row) != len(self._entries[0]):
raise ValueError("rows must have the same number of entries")
if len(self._entries) == 0:
raise ValueError("a matrix must have at least one entry")
def __repr__(self):
args = ", ".join("(" + ", ".join(repr(c) for c in r) + ",)" for r in self._entries)
return f"Matrix(({args}))"
def __str__(self):
first, last = self._entries[0][0], self._entries[-1][-1]
m, n = len(self._entries), len(self._entries[0])
return f"Matrix(({first!r}, ...), ..., (..., {last!r}))[{m:d}x{n:d}]"
Matrix
is an object as well.
id(Matrix)
94113690738160
type(Matrix)
type
Matrix
__main__.Matrix
Let's create a new Matrix
instance from a list
of tuple
s.
m = Matrix([(1, 2, 3), (4, 5, 6), (7, 8, 9)])
id(m)
140306180401856
type(m)
__main__.Matrix
The text representations work as above.
m
Matrix(((1.0, 2.0, 3.0,), (4.0, 5.0, 6.0,), (7.0, 8.0, 9.0,)))
print(m)
Matrix((1.0, ...), ..., (..., 9.0))[3x3]
Passing an invalid data
argument when instantiating a Matrix
results in the documented exceptions.
Matrix(())
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[45], line 1 ----> 1 Matrix(()) Cell In[36], line 9, in Matrix.__init__(self, data) 7 raise ValueError("rows must have the same number of entries") 8 if len(self._entries) == 0: ----> 9 raise ValueError("a matrix must have at least one entry") ValueError: a matrix must have at least one entry
Matrix([(1, 2, 3), (4, 5)])
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[46], line 1 ----> 1 Matrix([(1, 2, 3), (4, 5)]) Cell In[36], line 7, in Matrix.__init__(self, data) 5 for row in self._entries[1:]: 6 if len(row) != len(self._entries[0]): ----> 7 raise ValueError("rows must have the same number of entries") 8 if len(self._entries) == 0: 9 raise ValueError("a matrix must have at least one entry") ValueError: rows must have the same number of entries
The methods we have seen so far are all instance methods. The characteristic idea behind an instance method is that the behavior it provides either depends on the state of a concrete instance or mutates it. In other words, an instance method always works with attributes on the self
argument. If a method does not need access to self
to do its job, it is conceptually not an instance method and we should probably convert it into another kind of method as shown below.
An example of an instance method from linear algebra is the .transpose()
method below that switches the rows and columns of an existing Matrix
instance and returns a new Matrix
instance based off that. It is implemented by passing the iterator created with the zip() built-in as the
data
argument to the Matrix
constructor: The expression zip(*self._entries)
may be a bit hard to understand because of the involved unpacking but simply flips a Matrix
's rows and columns. The built-in list() constructor within the
.__init__()
method then materializes the iterator into the ._entries
attribute. Without a concrete Matrix
's rows and columns, .transpose()
does not make sense, conceptually speaking.
Also, we see that it is ok to reference a class from within one of its methods. While this seems trivial to some readers, others may find this confusing. The final versions of the Vector
and Matrix
classes in the fourth part of this chapter show how this "hard coded" redundancy can be avoided.
class Matrix:
def __init__(self, data):
self._entries = list(list(float(x) for x in r) for r in data)
# ...
def __repr__(self):
args = ", ".join("(" + ", ".join(repr(c) for c in r) + ",)" for r in self._entries)
return f"Matrix(({args}))"
def transpose(self):
return Matrix(zip(*self._entries))
The .transpose()
method returns a new Matrix
instance where the rows and columns are flipped.
m = Matrix([(1, 2, 3), (4, 5, 6), (7, 8, 9)])
m
Matrix(((1.0, 2.0, 3.0,), (4.0, 5.0, 6.0,), (7.0, 8.0, 9.0,)))
m.transpose()
Matrix(((1.0, 4.0, 7.0,), (2.0, 5.0, 8.0,), (3.0, 6.0, 9.0,)))
Two invocations of .transpose()
may be chained, which negates its overall effect but still creates a new instance object (i.e., m is n
is False
).
n = m.transpose().transpose()
n
Matrix(((1.0, 2.0, 3.0,), (4.0, 5.0, 6.0,), (7.0, 8.0, 9.0,)))
m is n
False
Unintuitively, the comparison operator ==
returns a wrong result as m
and n
have _entries
attributes that compare equal. We fix this in the "Operator Overloading" section later in this chapter.
m == n
False
Sometimes, it is useful to attach functionality to a class object that does not depend on the state of a concrete instance but on the class as a whole. Such methods are called class methods and can be created with the classmethod() built-in combined with the
@
decorator syntax. Then, Python adapts the binding process described above such that it implicitly inserts a reference to the class object itself instead of the instance when the method is invoked. By convention, we name this parameter cls
.
Class methods are often used to provide an alternative way to create instances, usually from a different kind of arguments. As an example, .from_columns()
expects a sequence of columns instead of rows as its data
argument. It forwards the invocation to the .__init__()
method (i.e., what cls(data)
does; cls
references the same class object as Matrix
), then calls the .transpose()
method on the newly created instance, and lastly returns the instance created by .transpose()
. Again, we are intelligently reusing a lot of code.
class Matrix:
def __init__(self, data):
self._entries = list(list(float(x) for x in r) for r in data)
# ...
def __repr__(self):
args = ", ".join("(" + ", ".join(repr(c) for c in r) + ",)" for r in self._entries)
return f"Matrix(({args}))"
def transpose(self):
return Matrix(zip(*self._entries))
@classmethod
def from_columns(cls, data):
return cls(data).transpose()
We use the alternative .from_columns()
constructor to create a Matrix
equivalent to m
above from a list
of columns instead of rows.
m = Matrix.from_columns([(1, 4, 7), (2, 5, 8), (3, 6, 9)])
m
Matrix(((1.0, 2.0, 3.0,), (4.0, 5.0, 6.0,), (7.0, 8.0, 9.0,)))
There is also a staticmethod() built-in to be used with the
@
syntax to define methods that are independent from both the class and instance objects but nevertheless related semantically to a class. In this case, the binding process is disabled an no argument is implicitly inserted upon a method's invocation. Such static methods are not really needed most of the time and we omit them here fore brevity.
After creation, a Matrix
instance exhibits certain properties that depend only on the concrete data
encapsulated in it. For example, every Matrix
instance implicitly has two dimensions: These are commonly denoted as m and n in math and represent the number of rows and columns.
We would like our Matrix
instances to have two attributes, .n_rows
and .n_cols
, that provide the correct dimensions as int
objects. To achieve that, we implement two instance methods, .n_rows()
and .n_cols()
, and make them derived attributes by decorating them with the property() built-in. They work like methods except that they do not need to be invoked with the call operator
()
but can be accessed as if they were instance variables.
To reuse their code, we integrate the new properties already within the .__init__()
method.
class Matrix:
def __init__(self, data):
self._entries = list(list(float(x) for x in r) for r in data)
for row in self._entries[1:]:
if len(row) != self.n_cols:
raise ValueError("rows must have the same number of entries")
if self.n_rows == 0:
raise ValueError("a matrix must have at least one entry")
def __repr__(self):
args = ", ".join("(" + ", ".join(repr(c) for c in r) + ",)" for r in self._entries)
return f"Matrix(({args}))"
@property
def n_rows(self):
return len(self._entries)
@property
def n_cols(self):
return len(self._entries[0])
The revised m
models a 2×3 matrix.
m = Matrix([(1, 2, 3), (4, 5, 6)])
m.n_rows, m.n_cols
(2, 3)
In its basic form, properties are read-only attributes. This makes sense for Matrix
instances where we can not "set" how many rows and columns there are while keeping the _entries
unchanged.
m.n_rows = 3
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[61], line 1 ----> 1 m.n_rows = 3 AttributeError: property 'n_rows' of 'Matrix' object has no setter