Note: Click on "Kernel" > "Restart Kernel and Clear All Outputs" in JupyterLab before reading this notebook to reset its output. If you cannot run this file on your machine, you may want to open it in the cloud .
After learning about the basic building blocks of expressing and structuring the business logic in programs, we focus our attention on the data types Python offers us, both built-in and available via the standard library or third-party packages.
We start with the "simple" ones: Numeric types in this chapter and textual data in Chapter 6 . An important fact that holds for all objects of these types is that they are immutable. To reuse the bag analogy from Chapter 1
, this means that the 0s and 1s making up an object's value cannot be changed once the bag is created in memory, implying that any operation with or method on the object creates a new object in a different memory location.
Chapter 7 , Chapter 8
, Chapter 9
, Chapter 10
then cover the more "complex" data types, including, for example, the
list
type. Finally, Chapter 11 completes the picture by introducing language constructs to create custom types.
We have already seen many hints indicating that numbers are not as trivial to work with as it seems at first sight:
int
vs. float
so far),float
numbers (e.g., 42 == 42.000000000000001
evaluates to True
), andfloat
"walks" and "quacks" like an int
, whereas the reverse is true in other cases.This chapter introduces all the built-in numeric types :
int
, float
, and complex
. To mitigate the limited precision of floating-point numbers, we also add an Appendix where we look at two replacements for the
float
type in the standard library , namely the
Decimal
type in the decimals and the
Fraction
type in the fractions module.
int
Type¶The simplest numeric type is the int
type: It behaves like an integer in ordinary math (i.e., the set Z) and supports operators in the way we saw in the section on arithmetic operators in Chapter 1
.
One way to create int
objects is by simply writing its value as a literal with the digits 0
to 9
.
a = 42
Just like any other object, the 42
has an identity, a type, and a value.
id(a)
94857615440928
type(a)
int
a
42
A nice feature in newer Python versions is using underscores _
as (thousands) separators in numeric literals. For example, 1_000_000
evaluates to 1000000
in memory; the _
s are ignored by the interpreter.
1_000_000
1000000
We may place the _
s anywhere we want.
1_2_3_4_5_6_7_8_9
123456789
It is syntactically invalid to write out leading 0
s in numeric literals. The reason for that will become apparent in the next section.
042
Cell In[7], line 1 042 ^ SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers
Another way to create int
objects is with the int() built-in that casts
float
or properly formatted str
objects as integers. Then, decimals are truncated (i.e., "cut off").
int(42.11)
42
int(42.87)
42
int(-42.87)
-42
When casting str
objects as int
s, the int() built-in is less forgiving. We must not include any decimals as shown by the
ValueError
. Yet, leading and trailing whitespace is gracefully ignored.
int("42")
42
int("42.0")
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[12], line 1 ----> 1 int("42.0") ValueError: invalid literal for int() with base 10: '42.0'
int(" 42 ")
42
The int
type follows all rules we know from math, apart from one exception: Whereas mathematicians to this day argue what the term 00 means (cf., this article ), programmers are pragmatic about this and simply define 00=1.
0 ** 0
1
As computers can only store 0s and 1s, int
objects are nothing but that in memory as well. Consequently, computer scientists and engineers developed conventions as to how 0s and 1s are "translated" into integers, and one such convention is the binary representation of non-negative integers. Consider the integers from 0 through 255 that are encoded into 0s and 1s with the help of this table:
Bit i | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
---|---|---|---|---|---|---|---|---|
Digit | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 |
= | 128 | 64 | 32 | 16 | 8 | 4 | 2 | 1 |
A number consists of exactly eight 0s and 1s that are read from right to left and referred to as the bits of the number. Each bit represents a distinct multiple of 2, the digit. For sure, we start counting at 0 again.
To encode the integer 3, for example, we need to find a combination of 0s and 1s such that the sum of digits marked with a 1 is equal to the number we want to encode. In the example, we set all bits to 0 except for the first (i=0) and second (i=1) as 20+21=1+2=3. So the binary representation of 3 is 00 00 00 11. To borrow some terminology from linear algebra, the 3 is a linear combination of the digits where the coefficients are either 0 or 1: 3=0∗128+0∗64+0∗32+0∗16+0∗8+0∗4+1∗2+1∗1. It is guaranteed that there is exactly one such combination for each number between 0 and 255.
As each bit in the binary representation is one of two values, we say that this representation has a base of 2. Often, the base is indicated with a subscript to avoid confusion. For example, we write 310=000000112 or 310=112 for short omitting leading 0s. A subscript of 10 implies a decimal number as we know it from elementary school.
We use the built-in bin() function to obtain an
int
object's binary representation: It returns a str
object starting with "0b"
indicating the binary format and as many 0s and 1s as are necessary to encode the integer omitting leading 0s.
bin(3)
'0b11'
We may pass a str
object formatted this way as the argument to the int() built-in, together with
base=2
, to create an int
object, for example, with the value of 3
.
int("0b11", base=2)
3
Moreover, we may also use the contents of the returned str
object as a literal instead: Just like we type, for example, 3
without quotes (i.e., "literally") into a code cell to create the int
object 3
, we may type 0b11
to obtain an int
object with the same value.
0b11
3
Another example is the integer 123
that is the sum of 64+32+16+8+2+1: Thus, its binary representation is the sequence of bits 01 11 10 11, or to use our new notation, 12310=11110112.
bin(123)
'0b1111011'
Analogous to typing 123
into a code cell, we may write 0b1111011
, or 0b_111_1011
to make use of the underscores, and create an int
object with the value 123
.
0b_111_1011
123
0
and 255
are the edge cases where we set all the bits to either 0 or 1.
bin(0)
'0b0'
bin(1)
'0b1'
bin(2)
'0b10'
bin(255)
'0b11111111'
Groups of eight bits are also called a byte. As a byte can only represent non-negative integers up to 255, the table above is extended conceptually with greater digits to the left to model integers beyond 255. The memory management needed to implement this is built into Python, and we do not need to worry about it.
For example, 789
is encoded with ten bits and 78910=11000101012.
bin(789)
'0b1100010101'
To contrast this bits encoding with the familiar decimal system, we show an equivalent table with powers of 10 as the digits:
Decimal | 3 | 2 | 1 | 0 |
---|---|---|---|---|
Digit | 103 | 102 | 101 | 100 |
= | 1000 | 100 | 10 | 1 |
Now, an integer is a linear combination of the digits where the coefficients are one of ten values, and the base is now 10. For example, the number 123 can be expressed as 0∗1000+1∗100+2∗10+3∗1. So, the binary representation follows the same logic as the decimal system taught in elementary school. The decimal system is intuitive to us humans, mostly as we learn to count with our ten fingers. The 0s and 1s in a computer's memory are therefore no rocket science; they only feel unintuitive for a beginner.
Adding two numbers in their binary representations is straightforward and works just like we all learned addition in elementary school. Going from right to left, we add the individual digits, and ...
1 + 2
3
bin(1) + " + " + bin(2) + " = " + bin(3)
'0b1 + 0b10 = 0b11'
... if any two digits add up to 2, the resulting digit is 0 and a 1 carries over.
1 + 3
4
bin(1) + " + " + bin(3) + " = " + bin(4)
'0b1 + 0b11 = 0b100'
Multiplication is also quite easy. All we need to do is to multiply the left operand by all digits of the right operand separately and then add up the individual products, just like in elementary school.
4 * 3
12
bin(4) + " * " + bin(3) + " = " + bin(12)
'0b100 * 0b11 = 0b1100'
bin(4) + " * " + bin(1) + " = " + bin(4) # multiply with first digit
'0b100 * 0b1 = 0b100'
bin(4) + " * " + bin(2) + " = " + bin(8) # multiply with second digit
'0b100 * 0b10 = 0b1000'
The Further Resources section at the end of this chapter provides video tutorials on addition and multiplication in binary. Subtraction and division are a bit more involved but essentially also easy to understand.
While in the binary and decimal systems there are two and ten distinct coefficients per digit, another convenient representation, the hexadecimal representation, uses a base of 16. It is convenient as one digit stores the same amount of information as four bits and the binary representation quickly becomes unreadable for larger numbers. The letters "a" through "f" are used as digits "10" through "15".
The following table summarizes the relationship between the three systems:
Decimal | Hexadecimal | Binary | Decimal | Hexadecimal | Binary | Decimal | Hexadecimal | Binary | ... | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0000 | 16 | 10 | 10000 | 32 | 20 | 100000 | ... | |||
1 | 1 | 0001 | 17 | 11 | 10001 | 33 | 21 | 100001 | ... | |||
2 | 2 | 0010 | 18 | 12 | 10010 | 34 | 22 | 100010 | ... | |||
3 | 3 | 0011 | 19 | 13 | 10011 | 35 | 23 | 100011 | ... | |||
4 | 4 | 0100 | 20 | 14 | 10100 | 36 | 24 | 100100 | ... | |||
5 | 5 | 0101 | 21 | 15 | 10101 | 37 | 25 | 100101 | ... | |||
6 | 6 | 0110 | 22 | 16 | 10110 | 38 | 26 | 100110 | ... | |||
7 | 7 | 0111 | 23 | 17 | 10111 | 39 | 27 | 100111 | ... | |||
8 | 8 | 1000 | 24 | 18 | 11000 | 40 | 28 | 101000 | ... | |||
9 | 9 | 1001 | 25 | 19 | 11001 | 41 | 29 | 101001 | ... | |||
10 | a | 1010 | 26 | 1a | 11010 | 42 | 2a | 101010 | ... | |||
11 | b | 1011 | 27 | 1b | 11011 | 43 | 2b | 101011 | ... | |||
12 | c | 1100 | 28 | 1c | 11100 | 44 | 2c | 101100 | ... | |||
13 | d | 1101 | 29 | 1d | 11101 | 45 | 2d | 101101 | ... | |||
14 | e | 1110 | 30 | 1e | 11110 | 46 | 2e | 101110 | ... | |||
15 | f | 1111 | 31 | 1f | 11111 | 47 | 2f | 101111 | ... |
To show more examples of the above subscript convention, we pick three random entries from the table:
1110=b16=10112
2510=1916=110012
4610=2e16=1011102
The built-in hex() function creates a
str
object starting with "0x"
representing an int
object's hexadecimal representation. The length depends on how many groups of four bits are implied by the corresponding binary representation.
For 0
and 1
, the hexadecimal representation is similar to the binary one.
hex(0)
'0x0'
hex(1)
'0x1'
Whereas bin(3)
already requires two digits, one is enough for hex(3)
.
hex(3) # bin(3) => "0b11"; two digits needed
'0x3'
For 10
and 15
, we see the letter digits for the first time.
hex(10)
'0xa'
hex(15)
'0xf'
The binary representation of 123
, 0b_111_1011
, can be viewed as two groups of four bits, 0111 and 1011, that are encoded as 7 and b in hexadecimal (cf., table above).
bin(123)
'0b1111011'
hex(123)
'0x7b'
To obtain a new int
object with the value 123
, we call the int() built-in with a properly formatted
str
object and base=16
as arguments.
int("0x7b", base=16)
123
Alternatively, we could use a literal notation instead.
0x7b
123
Hexadecimals between 0016 and ff16 (i.e., 010 and 25510) are commonly used to describe colors, for example, in web development but also graphics editors. See this online tool for some more background.
hex(0)
'0x0'
hex(255)
'0xff'
Just like binary representations, the hexadecimals extend to the left for larger numbers like 789
.
hex(789)
'0x315'
For completeness sake, we mention that there is also the oct() built-in to obtain an integer's octal representation. The logic is the same as for the hexadecimal representation, and we use eight instead of sixteen digits. That is the equivalent of viewing the binary representations in groups of three bits. As of today, octal representations have become less important, and the data science practitioner may probably live without them quite well.
While there are conventions that model negative integers with 0s and 1s in memory (cf., Two's Complement ), Python manages that for us, and we do not look into the theory here for brevity. We have learned all that a practitioner needs to know about how integers are modeled in a computer. The Further Resources
section at the end of this chapter provides a video tutorial on how the Two's Complement
idea works.
The binary and hexadecimal representations of negative integers are identical to their positive counterparts except that they start with a minus sign -
. However, as the video tutorial at the end of the chapter reveals, that is not how the bits are organized in memory.
bin(-3)
'-0b11'
hex(-3)
'-0x3'
bin(-255)
'-0b11111111'
hex(-255)
'-0xff'
Now that we know how integers are represented with 0s and 1s, we look at ways of working with the individual bits, in particular with the so-called bitwise operators: As the name suggests, the operators perform some operation on a bit by bit basis. They only work with and always return int
objects.
We keep this overview rather short as such "low-level" operations are not needed by the data science practitioner regularly. Yet, it is worthwhile to have heard about them as they form the basis of all of arithmetic in computers.
The first operator is the bitwise AND operator &
: It looks at the bits of its two operands, 11
and 13
in the example, in a pairwise fashion and if both operands have a 1 in the same position, the resulting integer will have a 1 in this position as well. Otherwise, the resulting integer will have a 0 in this position. The binary representations of 11
and 13
both have 1s in their respective first and fourth bits, which is why bin(11 & 13)
evaluates to Ob_1001
or 9
.
11 & 13
9
bin(11) + " & " + bin(13) # to show the operands' bits
'0b1011 & 0b1101'
bin(11 & 13)
'0b1001'
0b_1001
is the binary representation of 9
.
0b_1001
9
The bitwise OR operator |
evaluates to an int
object whose bits are set to 1 if the corresponding bits of either one or both operands are 1. So in the example 9 | 13
only the second bit is 0 for both operands, which is why the expression evaluates to 0b_1101
or 13
.
9 | 13
13
bin(9) + " | " + bin(13) # to show the operands' bits
'0b1001 | 0b1101'
bin(9 | 13)
'0b1101'
0b_1101
evaluates to an int
object with the value 13
.
0b_1101
13
The bitwise XOR operator ^
is a special case of the |
operator in that it evaluates to an int
object whose bits are set to 1 if the corresponding bit of exactly one of the two operands is 1. Colloquially, the "X" stands for "exclusive." The ^
operator must not be confused with the exponentiation operator **
! In the example, 9 ^ 13
, only the third bit differs between the two operands, which is why it evaluates to 0b_100
omitting the leading 0.
9 ^ 13
4
bin(9) + " ^ " + bin(13) # to show the operands' bits
'0b1001 ^ 0b1101'
bin(9 ^ 13)
'0b100'
0b_100
evaluates to an int
object with the value 4
.
0b_100
4
The bitwise NOT operator ~
, sometimes also called inversion operator, is said to "flip" the 0s into 1s and the 1s into 0s. However, it is based on the aforementioned Two's Complement convention and
~x = -(x + 1)
by definition (cf., the reference ). The full logic behind this, while actually quite simple, is considered out of scope in this book.
We can at least verify the definition by comparing the binary representations of 7
and -8
: They are indeed the same.
~7
-8
~7 == -(7 + 1) # = Two's Complement
True
bin(~7)
'-0b1000'
bin(-(7 + 1))
'-0b1000'
~x = -(x + 1)
can be reformulated as ~x + x = -1
, which is slightly easier to check.
~7 + 7
-1
Lastly, the bitwise left and right shift operators, <<
and >>
, shift all the bits either to the left or to the right. This corresponds to multiplying or dividing an integer by powers of 2
.
When shifting left, 0s are filled in.
7 << 2
28
bin(7)
'0b111'
bin(7 << 2)
'0b11100'
0b_1_1100
28
When shifting right, some bits are always lost.
7 >> 1
3
bin(7)
'0b111'
bin(7 >> 1)
'0b11'
0b_11
3