Note: Click on "Kernel" > "Restart Kernel and Clear All Outputs" in JupyterLab before reading this notebook to reset its output. If you cannot run this file on your machine, you may want to open it in the cloud .

Chapter 5: Numbers & Bits¶

After learning about the basic building blocks of expressing and structuring the business logic in programs, we focus our attention on the data types Python offers us, both built-in and available via the standard library or third-party packages.

We start with the "simple" ones: Numeric types in this chapter and textual data in Chapter 6 . An important fact that holds for all objects of these types is that they are immutable. To reuse the bag analogy from Chapter 1 , this means that the $0$ s and $1$ s making up an object's value cannot be changed once the bag is created in memory, implying that any operation with or method on the object creates a new object in a different memory location.

Chapter 7 , Chapter 8 , Chapter 9 , Chapter 10 then cover the more "complex" data types, including, for example, the list type. Finally, Chapter 11 completes the picture by introducing language constructs to create custom types.

We have already seen many hints indicating that numbers are not as trivial to work with as it seems at first sight:

Chapter 1 reveals that numbers may come in different data types (i.e., int vs. float so far),
Chapter 3 raises questions regarding the limited precision of float numbers (e.g., 42 == 42.000000000000001 evaluates to True), and
Chapter 4 shows that sometimes a float "walks" and "quacks" like an int, whereas the reverse is true in other cases.

This chapter introduces all the built-in numeric types : int, float, and complex. To mitigate the limited precision of floating-point numbers, we also add an Appendix where we look at two replacements for the float type in the standard library , namely the Decimal type in the decimals and the Fraction type in the fractions module.

The `int` Type¶

The simplest numeric type is the int type: It behaves like an integer in ordinary math (i.e., the set $\mathbb{Z}$ ) and supports operators in the way we saw in the section on arithmetic operators in Chapter 1 .

One way to create int objects is by simply writing its value as a literal with the digits 0 to 9.

In [1]:

a = 42

Just like any other object, the 42 has an identity, a type, and a value.

In [2]:

id(a)

Out[2]:

94857615440928

In [3]:

type(a)

Out[3]:

int

In [4]:

Out[4]:

A nice feature in newer Python versions is using underscores _ as (thousands) separators in numeric literals. For example, 1_000_000 evaluates to 1000000 in memory; the _s are ignored by the interpreter.

In [5]:

1_000_000

Out[5]:

We may place the _s anywhere we want.

In [6]:

1_2_3_4_5_6_7_8_9

Out[6]:

123456789

It is syntactically invalid to write out leading 0s in numeric literals. The reason for that will become apparent in the next section.

In [7]:

  Cell In[7], line 1
    042
    ^
SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers

Another way to create int objects is with the int() built-in that casts float or properly formatted str objects as integers. Then, decimals are truncated (i.e., "cut off").

In [8]:

int(42.11)

Out[8]:

In [9]:

int(42.87)

Out[9]:

Whereas the integer division operator // "rounds" towards negative infinity (cf., the "(Arithmetic) Operators" section in Chapter 1 ), the int() built-in rounds towards 0.

In [10]:

int(-42.87)

Out[10]:

-42

When casting str objects as ints, the int() built-in is less forgiving. We must not include any decimals as shown by the ValueError. Yet, leading and trailing whitespace is gracefully ignored.

In [11]:

int("42")

Out[11]:

In [12]:

int("42.0")

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[12], line 1
----> 1 int("42.0")

ValueError: invalid literal for int() with base 10: '42.0'

In [13]:

int(" 42 ")

Out[13]:

The int type follows all rules we know from math, apart from one exception: Whereas mathematicians to this day argue what the term $0^0$ means (cf., this article ), programmers are pragmatic about this and simply define $0^0 = 1$ .

In [14]:

0 ** 0

Out[14]:

Binary Representations¶

As computers can only store $0$ s and $1$ s, int objects are nothing but that in memory as well. Consequently, computer scientists and engineers developed conventions as to how $0$ s and $1$ s are "translated" into integers, and one such convention is the binary representation of non-negative integers. Consider the integers from $0$ through $255$ that are encoded into $0$ s and $1$ s with the help of this table:

Bit $i$	7	6	5	4	3	2	1	0
Digit	$2^7$	$2^6$	$2^5$	$2^4$	$2^3$	$2^2$	$2^1$	$2^0$
$=$	$128$	$64$	$32$	$16$	$8$	$4$	$2$	$1$

A number consists of exactly eight $0$ s and $1$ s that are read from right to left and referred to as the bits of the number. Each bit represents a distinct multiple of $2$ , the digit. For sure, we start counting at $0$ again.

To encode the integer $3$ , for example, we need to find a combination of $0$ s and $1$ s such that the sum of digits marked with a $1$ is equal to the number we want to encode. In the example, we set all bits to $0$ except for the first ( $i=0$ ) and second ( $i=1$ ) as $2^0 + 2^1 = 1 + 2 = 3$ . So the binary representation of $3$ is $00~00~00~11$ . To borrow some terminology from linear algebra, the $3$ is a linear combination of the digits where the coefficients are either $0$ or $1$ : $3 = 0*128 + 0*64 + 0*32 + 0*16 + 0*8 + 0*4 + 1*2 + 1*1$ . It is guaranteed that there is exactly one such combination for each number between $0$ and $255$ .

As each bit in the binary representation is one of two values, we say that this representation has a base of $2$ . Often, the base is indicated with a subscript to avoid confusion. For example, we write $3_{10} = 00000011_2$ or $3_{10} = 11_2$ for short omitting leading $0$ s. A subscript of $10$ implies a decimal number as we know it from elementary school.

We use the built-in bin() function to obtain an int object's binary representation: It returns a str object starting with "0b" indicating the binary format and as many $0$ s and $1$ s as are necessary to encode the integer omitting leading $0$ s.

In [15]:

bin(3)

Out[15]:

'0b11'

We may pass a str object formatted this way as the argument to the int() built-in, together with base=2, to create an int object, for example, with the value of 3.

In [16]:

int("0b11", base=2)

Out[16]:

Moreover, we may also use the contents of the returned str object as a literal instead: Just like we type, for example, 3 without quotes (i.e., "literally") into a code cell to create the int object 3, we may type 0b11 to obtain an int object with the same value.

In [17]:

0b11

Out[17]:

Another example is the integer 123 that is the sum of $64 + 32 + 16 + 8 + 2 + 1$ : Thus, its binary representation is the sequence of bits $01~11~10~11$ , or to use our new notation, $123_{10} = 1111011_2$ .

In [18]:

bin(123)

Out[18]:

'0b1111011'

Analogous to typing 123 into a code cell, we may write 0b1111011, or 0b_111_1011 to make use of the underscores, and create an int object with the value 123.

In [19]:

0b_111_1011

Out[19]:

0 and 255 are the edge cases where we set all the bits to either $0$ or $1$ .

In [20]:

bin(0)

Out[20]:

'0b0'

In [21]:

bin(1)

Out[21]:

'0b1'

In [22]:

bin(2)

Out[22]:

'0b10'

In [23]:

bin(255)

Out[23]:

'0b11111111'

Groups of eight bits are also called a byte. As a byte can only represent non-negative integers up to $255$ , the table above is extended conceptually with greater digits to the left to model integers beyond $255$ . The memory management needed to implement this is built into Python, and we do not need to worry about it.

For example, 789 is encoded with ten bits and $789_{10} = 1100010101_2$ .

In [24]:

bin(789)

Out[24]:

'0b1100010101'

To contrast this bits encoding with the familiar decimal system, we show an equivalent table with powers of $10$ as the digits:

Decimal	3	2	1	0
Digit	$10^3$	$10^2$	$10^1$	$10^0$
$=$	$1000$	$100$	$10$	$1$

Now, an integer is a linear combination of the digits where the coefficients are one of ten values, and the base is now $10$ . For example, the number $123$ can be expressed as $0*1000 + 1*100 + 2*10 + 3*1$ . So, the binary representation follows the same logic as the decimal system taught in elementary school. The decimal system is intuitive to us humans, mostly as we learn to count with our ten fingers. The $0$ s and $1$ s in a computer's memory are therefore no rocket science; they only feel unintuitive for a beginner.

Arithmetic with Bits¶

Adding two numbers in their binary representations is straightforward and works just like we all learned addition in elementary school. Going from right to left, we add the individual digits, and ...

In [25]:

1 + 2

Out[25]:

In [26]:

bin(1) + " + " + bin(2) + " = " + bin(3)

Out[26]:

'0b1 + 0b10 = 0b11'

... if any two digits add up to $2$ , the resulting digit is $0$ and a $1$ carries over.

In [27]:

1 + 3

Out[27]:

In [28]:

bin(1) + " + " + bin(3) + " = " + bin(4)

Out[28]:

'0b1 + 0b11 = 0b100'

Multiplication is also quite easy. All we need to do is to multiply the left operand by all digits of the right operand separately and then add up the individual products, just like in elementary school.

In [29]:

4 * 3

Out[29]:

In [30]:

bin(4) + " * " + bin(3) + " = " + bin(12)

Out[30]:

'0b100 * 0b11 = 0b1100'

In [31]:

bin(4) + " * " + bin(1) + " = " + bin(4)  # multiply with first digit

Out[31]:

'0b100 * 0b1 = 0b100'

In [32]:

bin(4) + " * " + bin(2) + " = " + bin(8)  # multiply with second digit

Out[32]:

'0b100 * 0b10 = 0b1000'

The Further Resources section at the end of this chapter provides video tutorials on addition and multiplication in binary. Subtraction and division are a bit more involved but essentially also easy to understand.

Hexadecimal Representations¶

While in the binary and decimal systems there are two and ten distinct coefficients per digit, another convenient representation, the hexadecimal representation, uses a base of $16$ . It is convenient as one digit stores the same amount of information as four bits and the binary representation quickly becomes unreadable for larger numbers. The letters "a" through "f" are used as digits "10" through "15".

The following table summarizes the relationship between the three systems:

Decimal	Hexadecimal	Binary	$~~~~~~$	Decimal	Hexadecimal	Binary	$~~~~~~$	Decimal	Hexadecimal	Binary	$~~~~~~$	...
0	0	0000	$~~~~~~$	16	10	10000	$~~~~~~$	32	20	100000	$~~~~~~$	...
1	1	0001	$~~~~~~$	17	11	10001	$~~~~~~$	33	21	100001	$~~~~~~$	...
2	2	0010	$~~~~~~$	18	12	10010	$~~~~~~$	34	22	100010	$~~~~~~$	...
3	3	0011	$~~~~~~$	19	13	10011	$~~~~~~$	35	23	100011	$~~~~~~$	...
4	4	0100	$~~~~~~$	20	14	10100	$~~~~~~$	36	24	100100	$~~~~~~$	...
5	5	0101	$~~~~~~$	21	15	10101	$~~~~~~$	37	25	100101	$~~~~~~$	...
6	6	0110	$~~~~~~$	22	16	10110	$~~~~~~$	38	26	100110	$~~~~~~$	...
7	7	0111	$~~~~~~$	23	17	10111	$~~~~~~$	39	27	100111	$~~~~~~$	...
8	8	1000	$~~~~~~$	24	18	11000	$~~~~~~$	40	28	101000	$~~~~~~$	...
9	9	1001	$~~~~~~$	25	19	11001	$~~~~~~$	41	29	101001	$~~~~~~$	...
10	a	1010	$~~~~~~$	26	1a	11010	$~~~~~~$	42	2a	101010	$~~~~~~$	...
11	b	1011	$~~~~~~$	27	1b	11011	$~~~~~~$	43	2b	101011	$~~~~~~$	...
12	c	1100	$~~~~~~$	28	1c	11100	$~~~~~~$	44	2c	101100	$~~~~~~$	...
13	d	1101	$~~~~~~$	29	1d	11101	$~~~~~~$	45	2d	101101	$~~~~~~$	...
14	e	1110	$~~~~~~$	30	1e	11110	$~~~~~~$	46	2e	101110	$~~~~~~$	...
15	f	1111	$~~~~~~$	31	1f	11111	$~~~~~~$	47	2f	101111	$~~~~~~$	...

To show more examples of the above subscript convention, we pick three random entries from the table:

$11_{10} = \text{b}_{16} = 1011_2$

$25_{10} = 19_{16} = 11001_2$

$46_{10} = 2\text{e}_{16} = 101110_2$

The built-in hex() function creates a str object starting with "0x" representing an int object's hexadecimal representation. The length depends on how many groups of four bits are implied by the corresponding binary representation.

For 0 and 1, the hexadecimal representation is similar to the binary one.

In [33]:

hex(0)

Out[33]:

'0x0'

In [34]:

hex(1)

Out[34]:

'0x1'

Whereas bin(3) already requires two digits, one is enough for hex(3).

In [35]:

hex(3)  # bin(3) => "0b11"; two digits needed

Out[35]:

'0x3'

For 10 and 15, we see the letter digits for the first time.

In [36]:

hex(10)

Out[36]:

'0xa'

In [37]:

hex(15)

Out[37]:

'0xf'

The binary representation of 123, 0b_111_1011, can be viewed as two groups of four bits, $0111$ and $1011$ , that are encoded as $7$ and $\text{b}$ in hexadecimal (cf., table above).

In [38]:

bin(123)

Out[38]:

'0b1111011'

In [39]:

hex(123)

Out[39]:

'0x7b'

To obtain a new int object with the value 123, we call the int() built-in with a properly formatted str object and base=16 as arguments.

In [40]:

int("0x7b", base=16)

Out[40]:

Alternatively, we could use a literal notation instead.

In [41]:

0x7b

Out[41]:

Hexadecimals between $00_{16}$ and $\text{ff}_{16}$ (i.e., $0_{10}$ and $255_{10}$ ) are commonly used to describe colors, for example, in web development but also graphics editors. See this online tool for some more background.

In [42]:

hex(0)

Out[42]:

'0x0'

In [43]:

hex(255)

Out[43]:

'0xff'

Just like binary representations, the hexadecimals extend to the left for larger numbers like 789.

In [44]:

hex(789)

Out[44]:

'0x315'

For completeness sake, we mention that there is also the oct() built-in to obtain an integer's octal representation. The logic is the same as for the hexadecimal representation, and we use eight instead of sixteen digits. That is the equivalent of viewing the binary representations in groups of three bits. As of today, octal representations have become less important, and the data science practitioner may probably live without them quite well.

Negative Values¶

While there are conventions that model negative integers with $0$ s and $1$ s in memory (cf., Two's Complement ), Python manages that for us, and we do not look into the theory here for brevity. We have learned all that a practitioner needs to know about how integers are modeled in a computer. The Further Resources section at the end of this chapter provides a video tutorial on how the Two's Complement idea works.

The binary and hexadecimal representations of negative integers are identical to their positive counterparts except that they start with a minus sign -. However, as the video tutorial at the end of the chapter reveals, that is not how the bits are organized in memory.

In [45]:

bin(-3)

Out[45]:

'-0b11'

In [46]:

hex(-3)

Out[46]:

'-0x3'

In [47]:

bin(-255)

Out[47]:

'-0b11111111'

In [48]:

hex(-255)

Out[48]:

'-0xff'

Bitwise Operators¶

Now that we know how integers are represented with $0$ s and $1$ s, we look at ways of working with the individual bits, in particular with the so-called bitwise operators: As the name suggests, the operators perform some operation on a bit by bit basis. They only work with and always return int objects.

We keep this overview rather short as such "low-level" operations are not needed by the data science practitioner regularly. Yet, it is worthwhile to have heard about them as they form the basis of all of arithmetic in computers.

The first operator is the bitwise AND operator &: It looks at the bits of its two operands, 11 and 13 in the example, in a pairwise fashion and if both operands have a $1$ in the same position, the resulting integer will have a $1$ in this position as well. Otherwise, the resulting integer will have a $0$ in this position. The binary representations of 11 and 13 both have $1$ s in their respective first and fourth bits, which is why bin(11 & 13) evaluates to Ob_1001 or 9.

In [49]:

11 & 13

Out[49]:

In [50]:

bin(11) + " & " + bin(13)  # to show the operands' bits

Out[50]:

'0b1011 & 0b1101'

In [51]:

bin(11 & 13)

Out[51]:

'0b1001'

0b_1001 is the binary representation of 9.

In [52]:

0b_1001

Out[52]:

The bitwise OR operator | evaluates to an int object whose bits are set to $1$ if the corresponding bits of either one or both operands are $1$ . So in the example 9 | 13 only the second bit is $0$ for both operands, which is why the expression evaluates to 0b_1101 or 13.

In [53]:

9 | 13

Out[53]:

In [54]:

bin(9) + " | " + bin(13)  # to show the operands' bits

Out[54]:

'0b1001 | 0b1101'

In [55]:

bin(9 | 13)

Out[55]:

'0b1101'

0b_1101 evaluates to an int object with the value 13.

In [56]:

0b_1101

Out[56]:

The bitwise XOR operator ^ is a special case of the | operator in that it evaluates to an int object whose bits are set to $1$ if the corresponding bit of exactly one of the two operands is $1$ . Colloquially, the "X" stands for "exclusive." The ^ operator must not be confused with the exponentiation operator **! In the example, 9 ^ 13, only the third bit differs between the two operands, which is why it evaluates to 0b_100 omitting the leading $0$ .

In [57]:

9 ^ 13

Out[57]:

In [58]:

bin(9) + " ^ " + bin(13)  # to show the operands' bits

Out[58]:

'0b1001 ^ 0b1101'

In [59]:

bin(9 ^ 13)

Out[59]:

'0b100'

0b_100 evaluates to an int object with the value 4.

In [60]:

0b_100

Out[60]:

The bitwise NOT operator ~, sometimes also called inversion operator, is said to "flip" the $0$ s into $1$ s and the $1$ s into $0$ s. However, it is based on the aforementioned Two's Complement convention and ~x = -(x + 1) by definition (cf., the reference ). The full logic behind this, while actually quite simple, is considered out of scope in this book.

We can at least verify the definition by comparing the binary representations of 7 and -8: They are indeed the same.

In [61]:

~7

Out[61]:

-8

In [62]:

~7 == -(7 + 1)  # = Two's Complement

Out[62]:

True

In [63]:

bin(~7)

Out[63]:

'-0b1000'

In [64]:

bin(-(7 + 1))

Out[64]:

'-0b1000'

~x = -(x + 1) can be reformulated as ~x + x = -1, which is slightly easier to check.

In [65]:

~7 + 7

Out[65]:

-1

Lastly, the bitwise left and right shift operators, << and >>, shift all the bits either to the left or to the right. This corresponds to multiplying or dividing an integer by powers of 2.

When shifting left, $0$ s are filled in.

In [66]:

7 << 2

Out[66]:

In [67]:

bin(7)

Out[67]:

'0b111'

In [68]:

bin(7 << 2)

Out[68]:

'0b11100'

In [69]:

0b_1_1100

Out[69]:

When shifting right, some bits are always lost.

In [70]:

7 >> 1

Out[70]:

In [71]:

bin(7)

Out[71]:

'0b111'

In [72]:

bin(7 >> 1)

Out[72]:

'0b11'

In [73]:

0b_11

Out[73]: