Note: Click on "Kernel" > "Restart Kernel and Clear All Outputs" in JupyterLab before reading this notebook to reset its output. If you cannot run this file on your machine, you may want to open it in the cloud .
In the third part of the chapter, we proposed the idea that
tuple
objects are like "immutable lists." Often, however, we use tuple
objects to represent a record of related fields. Then, each element has a semantic meaning (i.e., a descriptive name).
As an example, think of a spreadsheet with information on students in a course. Each row represents a record and holds all the data associated with an individual student. The columns (e.g., matriculation number, first name, last name) are the fields that may come as different data types (e.g., int
for the matriculation number, str
for the names).
A simple way of modeling a single student is as a tuple
object, for example, (123456, "John", "Doe")
. A disadvantage of this approach is that we must remember the order and meaning of the elements/fields in the tuple
object.
An example from a different domain is the representation of (x,y)-points in the x-y-plane. Again, we could use a tuple
object like current_position
below to model the point (4,2).
current_position = (4, 2)
We implicitly assume that the first element represents the x and the second the y coordinate. While that follows intuitively from convention in math, we should at least add comments somewhere in the code to document this assumption.
namedtuple
Type¶A better way is to create a custom data type. While that is covered in depth in Chapter 11 , the collections
module in the standard library
provides a namedtuple()
factory function that creates "simple" custom data types on top of the standard
tuple
type.
from collections import namedtuple
namedtuple() takes two arguments. The first argument is the name of the data type. That could be different from the variable
Point
we use to refer to the new type, but in most cases it is best to keep them in sync. The second argument is a sequence with the field names as str
objects. The names' order corresponds to the one assumed in current_position
.
Point = namedtuple("Point", ["x", "y"])
The Point
object is a so-called class. That is what it means if an object is of type type
. It can be used as a factory to create new tuple
-like objects of type Point
. In a way, namedtuple() gives us a way to create our own custom constructors.
id(Point)
94457911453856
type(Point)
type
The value of Point
is just itself in a literal notation.
Point
__main__.Point
We write Point(4, 2)
to create a new object of type Point
.
current_position = Point(4, 2)
Now, current_position
has a somewhat nicer representation. In particular, the coordinates are named x
and y
.
current_position
Point(x=4, y=2)
It is not a tuple
any more but an object of type Point
.
id(current_position)
140376178109184
type(current_position)
__main__.Point
We use the dot operator .
to access the defined attributes.
current_position.x
4
current_position.y
2
As before, we get an AttributeError
if we try to access an undefined attribute.
current_position.z
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[13], line 1 ----> 1 current_position.z AttributeError: 'Point' object has no attribute 'z'
current_position
continues to work like a tuple
object! That is why we can use namedtuple
as a replacement for tuple
. The underlying implementations exhibit the same computational efficiencies and memory usages.
For example, we can index into or loop over current_position
as it is still a sequence with the familiar four properties.
current_position[0]
4
current_position[1]
2
for number in current_position:
print(number)
4 2
for number in reversed(current_position):
print(number)
2 4
len(current_position)
2