Note: Click on "Kernel" > "Restart Kernel and Clear All Outputs" in JupyterLab before reading this notebook to reset its output. If you cannot run this file on your machine, you may want to open it in the cloud .

Chapter 9: Mappings & Sets (Appendix)¶

The collections module in the standard library provides specialized mapping types for common use cases.

The `defaultdict` Type¶

The defaultdict type allows us to define a factory function that creates default values whenever we look up a key that does not yet exist. Ordinary dict objects would throw a KeyError exception in such situations.

Let's say we have a list with records of goals scored during a soccer game. The records consist of the fields "Country," "Player," and the "Time" when a goal was scored. Our task is to group the goals by player and/or country.

In [1]:

goals = [
    ("Germany", "Müller", 11), ("Germany", "Klose", 23),
    ("Germany", "Kroos", 24), ("Germany", "Kroos", 26),
    ("Germany", "Khedira", 29), ("Germany", "Schürrle", 69),
    ("Germany", "Schürrle", 79), ("Brazil", "Oscar", 90),
]

Using a normal dict object, we have to tediously check if a player has already scored a goal before. If not, we must create a new list object with the first time the player scored. Otherwise, we append the goal to an already existing list object.

In [2]:

goals_by_player = {}

for _, player, minute in goals:
    if player not in goals_by_player:
        goals_by_player[player] = [minute]
    else:
        goals_by_player[player].append(minute)

goals_by_player

Out[2]:

{'Müller': [11],
 'Klose': [23],
 'Kroos': [24, 26],
 'Khedira': [29],
 'Schürrle': [69, 79],
 'Oscar': [90]}

Instead, with a defaultdict object, we can portray the code fragment's intent in a concise form. We pass a reference to the list() built-in to defaultdict.

In [3]:

from collections import defaultdict

In [4]:

goals_by_player = defaultdict(list)

for _, player, minute in goals:
    goals_by_player[player].append(minute)

goals_by_player

Out[4]:

defaultdict(list,
            {'Müller': [11],
             'Klose': [23],
             'Kroos': [24, 26],
             'Khedira': [29],
             'Schürrle': [69, 79],
             'Oscar': [90]})

In [5]:

type(goals_by_player)

Out[5]:

collections.defaultdict

A reference to the factory function is stored in the default_factory attribute.

In [6]:

goals_by_player.default_factory

Out[6]:

list

If we want this code to produce a normal dict object, we pass goals_by_player to the dict() constructor.

In [7]:

dict(goals_by_player)

Out[7]:

{'Müller': [11],
 'Klose': [23],
 'Kroos': [24, 26],
 'Khedira': [29],
 'Schürrle': [69, 79],
 'Oscar': [90]}

Being creative, we use a factory function, created with a lambda expression, that returns another defaultdict with list() as its factory to group on the country and the player level simultaneously.

In [8]:

goals_by_country_and_player = defaultdict(lambda: defaultdict(list))

for country, player, minute in goals:
    goals_by_country_and_player[country][player].append(minute)

goals_by_country_and_player

Out[8]:

defaultdict(<function __main__.<lambda>()>,
            {'Germany': defaultdict(list,
                         {'Müller': [11],
                          'Klose': [23],
                          'Kroos': [24, 26],
                          'Khedira': [29],
                          'Schürrle': [69, 79]}),
             'Brazil': defaultdict(list, {'Oscar': [90]})})

Conversion into a normal and nested dict object is now a bit tricky but can be achieved in one line with a comprehension.

In [9]:

{country: dict(by_player) for country, by_player in goals_by_country_and_player.items()}

Out[9]:

{'Germany': {'Müller': [11],
  'Klose': [23],
  'Kroos': [24, 26],
  'Khedira': [29],
  'Schürrle': [69, 79]},
 'Brazil': {'Oscar': [90]}}

The `Counter` Type¶

A common task is to count the number of occurrences of elements in an iterable.

The Counter type provides an easy-to-use interface that can be called with any iterable and returns a dict-like object of type Counter that maps each unique elements to the number of times it occurs.

To continue the previous example, let's create an overview that shows how many goals a player scorred. We use a generator expression as the argument to Counter.

In [10]:

goals

Out[10]:

[('Germany', 'Müller', 11),
 ('Germany', 'Klose', 23),
 ('Germany', 'Kroos', 24),
 ('Germany', 'Kroos', 26),
 ('Germany', 'Khedira', 29),
 ('Germany', 'Schürrle', 69),
 ('Germany', 'Schürrle', 79),
 ('Brazil', 'Oscar', 90)]

In [11]:

from collections import Counter

In [12]:

scorers = Counter(x[1] for x in goals)

In [13]:

scorers

Out[13]:

Counter({'Kroos': 2,
         'Schürrle': 2,
         'Müller': 1,
         'Klose': 1,
         'Khedira': 1,
         'Oscar': 1})

In [14]:

type(scorers)

Out[14]:

collections.Counter

Now we can look up individual players. scores behaves like a normal dictionary with regard to key look-ups.

In [15]:

scorers["Müller"]

Out[15]:

By default, it returns 0 if a key is not found. So, we do not have to handle a KeyError.

In [16]:

scorers["Lahm"]

Out[16]:

Counter objects have a .most_common() method that returns a list object containing $2$ -element tuple objects, where the first element is the element from the original iterable and the second the number of occurrences. The list object is sorted in descending order of occurrences.

In [17]:

scorers.most_common(2)

Out[17]:

[('Kroos', 2), ('Schürrle', 2)]

We can increase the count of individual entries with the .update() method: That takes an iterable of the elements we want to count.

Imagine if Philipp Lahm had also scored against Brazil.

In [18]:

scorers.update(["Lahm"])

In [19]:

scorers

Out[19]:

Counter({'Kroos': 2,
         'Schürrle': 2,
         'Müller': 1,
         'Klose': 1,
         'Khedira': 1,
         'Oscar': 1,
         'Lahm': 1})

If we use a str object as the argument instead, each individual character is treated as an element to be updated. That is most likely not what we want.

In [20]:

scorers.update("Lahm")

In [21]:

scorers

Out[21]:

Counter({'Kroos': 2,
         'Schürrle': 2,
         'Müller': 1,
         'Klose': 1,
         'Khedira': 1,
         'Oscar': 1,
         'Lahm': 1,
         'L': 1,
         'a': 1,
         'h': 1,
         'm': 1})

The `ChainMap` Type¶

Consider to_words, more_words, and even_more_words below. Instead of merging the items of the three dict objects together into a new one, we want to create an object that behaves as if it contained all the unified items in it without materializing them in memory a second time.

In [22]:

to_words = {
    0: "zero",
    1: "one",
    2: "two",
}

In [23]:

more_words = {
    2: "TWO",  # to illustrate a point
    3: "three",
    4: "four",
}

In [24]:

even_more_words = {
    4: "FOUR",  # to illustrate a point
    5: "five",
    6: "six",
}

The ChainMap type allows us to do precisely that.

In [25]:

from collections import ChainMap

We simply pass all mappings as positional arguments to ChainMap and obtain a proxy object that occupies almost no memory but gives us access to the union of all the items.

In [26]:

chain = ChainMap(to_words, more_words, even_more_words)

Let's loop over the items in chain and see what is "in" it. The order is obviously unpredictable but all seven items we expected are there. Keys of later mappings do not overwrite earlier keys.

In [27]:

for number, word in chain.items():
    print(number, word)

4 four
5 five
6 six
2 two
3 three
0 zero
1 one

When looking up a non-existent key, ChainMap objects raise a KeyError just like normal dict objects would.

In [28]:

chain[10]

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[28], line 1
----> 1 chain[10]

File /usr/lib64/python3.12/collections/__init__.py:1014, in ChainMap.__getitem__(self, key)
   1012     except KeyError:
   1013         pass
-> 1014 return self.__missing__(key)

File /usr/lib64/python3.12/collections/__init__.py:1006, in ChainMap.__missing__(self, key)
   1005 def __missing__(self, key):
-> 1006     raise KeyError(key)

KeyError: 10

Chapter 9: Mappings & Sets (Appendix)¶

The defaultdict Type¶

The Counter Type¶

The ChainMap Type¶

The `defaultdict` Type¶

The `Counter` Type¶

The `ChainMap` Type¶