Note: Click on "Kernel" > "Restart Kernel and Clear All Outputs" in JupyterLab before reading this notebook to reset its output. If you cannot run this file on your machine, you may want to open it in the cloud .

Chapter 7: Sequential Data (Appendix)¶

In the third part of the chapter, we proposed the idea that tuple objects are like "immutable lists." Often, however, we use tuple objects to represent a record of related fields. Then, each element has a semantic meaning (i.e., a descriptive name).

As an example, think of a spreadsheet with information on students in a course. Each row represents a record and holds all the data associated with an individual student. The columns (e.g., matriculation number, first name, last name) are the fields that may come as different data types (e.g., int for the matriculation number, str for the names).

A simple way of modeling a single student is as a tuple object, for example, (123456, "John", "Doe"). A disadvantage of this approach is that we must remember the order and meaning of the elements/fields in the tuple object.

An example from a different domain is the representation of $(x, y)$ -points in the $x$ - $y$ -plane. Again, we could use a tuple object like current_position below to model the point $(4, 2)$ .

In [1]:

current_position = (4, 2)

We implicitly assume that the first element represents the $x$ and the second the $y$ coordinate. While that follows intuitively from convention in math, we should at least add comments somewhere in the code to document this assumption.

The `namedtuple` Type¶

A better way is to create a custom data type. While that is covered in depth in Chapter 11 , the collections module in the standard library provides a namedtuple() factory function that creates "simple" custom data types on top of the standard tuple type.

In [2]:

from collections import namedtuple

namedtuple() takes two arguments. The first argument is the name of the data type. That could be different from the variable Point we use to refer to the new type, but in most cases it is best to keep them in sync. The second argument is a sequence with the field names as str objects. The names' order corresponds to the one assumed in current_position.

In [3]:

Point = namedtuple("Point", ["x", "y"])

The Point object is a so-called class. That is what it means if an object is of type type. It can be used as a factory to create new tuple-like objects of type Point. In a way, namedtuple() gives us a way to create our own custom constructors.

In [4]:

id(Point)

Out[4]:

94457911453856

In [5]:

type(Point)

Out[5]:

type

The value of Point is just itself in a literal notation.

In [6]:

Point

Out[6]:

__main__.Point

We write Point(4, 2) to create a new object of type Point.

In [7]:

current_position = Point(4, 2)

Now, current_position has a somewhat nicer representation. In particular, the coordinates are named x and y.

In [8]:

current_position

Out[8]:

Point(x=4, y=2)

It is not a tuple any more but an object of type Point.

In [9]:

id(current_position)

Out[9]:

140376178109184

In [10]:

type(current_position)

Out[10]:

__main__.Point

We use the dot operator . to access the defined attributes.

In [11]:

current_position.x

Out[11]:

In [12]:

current_position.y

Out[12]:

As before, we get an AttributeError if we try to access an undefined attribute.

In [13]:

current_position.z

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[13], line 1
----> 1 current_position.z

AttributeError: 'Point' object has no attribute 'z'

current_position continues to work like a tuple object! That is why we can use namedtuple as a replacement for tuple. The underlying implementations exhibit the same computational efficiencies and memory usages.

For example, we can index into or loop over current_position as it is still a sequence with the familiar four properties.

In [14]:

current_position[0]

Out[14]:

In [15]:

current_position[1]

Out[15]:

In [16]:

for number in current_position:
    print(number)

4
2

In [17]:

for number in reversed(current_position):
    print(number)

2
4

In [18]:

len(current_position)

Out[18]:

Chapter 7: Sequential Data (Appendix)¶

The namedtuple Type¶

The `namedtuple` Type¶