Note: Click on "Kernel" > "Restart Kernel and Clear All Outputs" in JupyterLab before reading this notebook to reset its output. If you cannot run this file on your machine, you may want to open it in the cloud .
In this part of the chapter, we look at the set
type, a concrete example of the more abstract concept of sets as a specialization of collections.
set
Type¶Python's provides a built-in set
type that resembles mathematical sets : Each element may only be a member of a set once, and there is no predictable order regarding the elements (cf., documentation
).
To create a set, we use the literal notation, {..., ...}
, and list all the members. The redundant 7
s and 4
s below are discarded.
numbers = {7, 7, 7, 7, 7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4, 4, 4, 4, 4}
set
objects are objects on their own.
id(numbers)
139739115782880
type(numbers)
set
numbers
{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}
To create an empty set, we must use the built-in set() constructor as empty curly brackets,
{}
, already create an empty dict
object.
empty_dict = {}
type(empty_dict)
dict
empty_set = set()
empty_set
set()
The set() constructor takes any
iterable
and only keeps unique elements.
For example, we obtain all unique letters of a long word like so.
set("abracadabra")
{'a', 'b', 'c', 'd', 'r'}
The curly brackets notation can be viewed as a hint that dict
objects are conceptually generalizations of set
objects, and we think of set
objects as a collection consisting of a dict
object's keys with all the mapped values removed.
Like dict
objects, set
objects are built on top of hash tables , and, thus, each element must be a hashable (i.e., immutable) object and can only be contained in a set once due to the buckets logic.
{0, [1, 2], 3}
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[10], line 1 ----> 1 {0, [1, 2], 3} TypeError: unhashable type: 'list'
len() tells us the number of elements in a
set
object.
len(numbers)
12
We may loop over the elements in a set
object, but we must keep in mind that there is no predictable order. In contrast to dict
objects, the iteration order is also not guaranteed to be the insertion order. Because of the special hash values for int
objects, numbers
seems to be "magically" sorted, which is not the case.
for number in numbers: # beware the non-order!
print(number, end=" ")
1 2 3 4 5 6 7 8 9 10 11 12
We confirm that set
objects are not Reversible
.
for number in reversed(numbers):
print(number, end=" ")
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[13], line 1 ----> 1 for number in reversed(numbers): 2 print(number, end=" ") TypeError: 'set' object is not reversible
The boolean in
operator checks if a given and immutable object compares equal to an element in a set
object. As with dict
objects, membership testing is an extremely fast operation. Conceptually, it has the same result as conducting a linear search with the ==
operator behind the scenes without doing it.
0 in numbers
False
1 in numbers
True
2.0 in numbers # 2.0 is not a member itself but compares equal to a member
True
There is are Set
ABC in the collections.abc module that formalizes, in particular, the operators supported by
set
objects (cf., the "Set Operations" section below). Further, the MutableSet
ABC standardizes all the methods that mutate a set
object in place (cf., the "Mutability & Set Methods" section below).
import collections.abc as abc
isinstance(numbers, abc.Set)
True
isinstance(numbers, abc.MutableSet)
True
As set
objects come without a predictable order, indexing and slicing are not supported and result in a TypeError
. In particular, as there are no values to be looked up, these operations are not semantically meaningful. Instead, we check membership via the in
operator, as shown in the previous sub-section.
numbers
{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}
numbers[0]
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[21], line 1 ----> 1 numbers[0] TypeError: 'set' object is not subscriptable
numbers[:]
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[22], line 1 ----> 1 numbers[:] TypeError: 'set' object is not subscriptable
set
Methods¶Because the []
operator does not work for set
objects, they are mutated mainly via methods (cf., documentation ).
We may add new elements to an existing set
object with the .add() method.
numbers.add(999)
numbers
{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 999}
The .update() method takes an iterable and adds all its elements to a
set
object if they are not already contained in it.
numbers.update(range(5))
numbers
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 999}
To remove an element by value, we use the .remove() or .discard()
methods. If the element to be removed is not in the
set
object, .remove() raises a loud
KeyError
while .discard() stays silent.
numbers.remove(999)
numbers.remove(999)
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) Cell In[28], line 1 ----> 1 numbers.remove(999) KeyError: 999
numbers.discard(0)
numbers.discard(0)
numbers
{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}
The .pop() method removes an arbitrary element. As not even the insertion order is tracked, that removes a "random" element.
numbers.pop()
1
numbers
{2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}
The .clear() method discards all elements but keeps the
set
object alive.
numbers.clear()
numbers
set()
id(numbers) # same memory location as before
139739115782880
set
Operations¶The arithmetic and relational operators are overloaded with typical set operations from math. The operators have methods that do the equivalent. We omit them for brevity in this section and only show them as comments in the code cells. Both the operators and the methods return new set
objects without modifying the operands.
We showcase the set operations with easy math examples.
numbers = set(range(1, 13))
zero = {0}
evens = set(range(2, 13, 2))
numbers
{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}
zero
{0}
evens
{2, 4, 6, 8, 10, 12}
The bitwise OR operator |
returns the union of two set
objects.
zero | numbers # zero.union(numbers)
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}
Of course, the operators may be chained.
zero | numbers | evens # zero.union(numbers).union(evens)
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}
To obtain an intersection of two or more set
objects, we use the bitwise AND operator &
.
zero & numbers # zero.intersection(numbers)
set()
numbers & evens # numbers.intersection(evens)
{2, 4, 6, 8, 10, 12}
To calculate a set
object containing all elements that are in one but not the other set
object, we use the minus operator -
. This operation is not symmetric.
numbers - evens # numbers.difference(evens)
{1, 3, 5, 7, 9, 11}
evens - numbers # evens.difference(numbers)
set()
The symmetric difference is defined as the set
object containing all elements that are in one but not both set
objects. It is calculated with the bitwise XOR operator ^
.
numbers ^ evens # numbers.symmetric_difference(evens)
{1, 3, 5, 7, 9, 11}
evens ^ numbers # evens.symmetric_difference(numbers)
{1, 3, 5, 7, 9, 11}
The augmented versions of the four operators (e.g., |
becomes |=
) are also defined: They mutate the left operand in place.
set
Comprehensions¶Python provides a literal notation for set
comprehensions that works exactly like the one for dict
comprehensions described in the first part of this chapter except that they use a single loop variable instead of a key-value pair.
For example, let's create a new set
object that consists of the squares of all the elements of numbers
.
{x ** 2 for x in numbers}
{1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144}
As before, we may have multiple for
- and/or if
-clauses.
For example, let's only keep the squares if they turn out to be an even number, or ...
{x ** 2 for x in numbers if (x ** 2) % 2 == 0}
{4, 16, 36, 64, 100, 144}
... create a set
object with all products obtained from the Cartesian product of numbers
with itself as long as the products are greater than 80
.
{x * y for x in numbers for y in numbers if x * y > 80}
{81, 84, 88, 90, 96, 99, 100, 108, 110, 120, 121, 132, 144}
frozenset
Type¶As set
objects are mutable, they may not be used, for example, as keys in a dict
object. Similar to how we replace list
with tuple
objects, we may often use a frozenset
object instead of an ordinary one. The frozenset
type is a built-in, and as frozenset
objects are immutable, the only limitation is that we must specify all elements upon creation (cf., documentation ).
frozenset
objects are created by passing an iterable to the built-in frozenset() constructor. Even though
frozenset
objects are hashable, their elements are not ordered.
frozenset([7, 7, 7, 7, 7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4, 4, 4, 4, 4])
frozenset({1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12})