Note: Click on "Kernel" > "Restart Kernel and Clear All Outputs" in JupyterLab before reading this notebook to reset its output. If you cannot run this file on your machine, you may want to open it in the cloud .
In this second part of Chapter 1, we look a bit closer into how the memory works and introduce a couple of "theoretical" terms.
Variables are created with the assignment statement
=
, which is not an operator because of its side effect of making a name reference an object in memory.
We read the terms variable, name, and identifier used interchangebly in many Python-related texts. In this book, we adopt the following convention: First, we treat name and identifier as perfect synonyms but only use the term name in the text for clarity. Second, whereas name only refers to a string of letters, numbers, and some other symbols, a variable means the combination of a name and a reference to an object in memory.
variable = 20.0
When used as a literal, a variable evaluates to the value of the object it references. Colloquially, we could say that variable
evaluates to 20.0
, but this would not be an accurate description of what is going on in memory. We see some more colloquialisms in this section but should always relate this to what Python actually does in memory.
variable
20.0
A variable may be re-assigned as often as we wish. Thereby, we could also assign an object of a different type. Because this is allowed, Python is said to be a dynamically typed language. On the contrary, a statically typed language like C also allows re-assignment but only with objects of the same type. This subtle distinction is one reason why Python is slower at execution than C: As it runs a program, it needs to figure out an object's type each time it is referenced.
variable = 20
variable
20
If we want to re-assign a variable while referencing its "old" (i.e., current) object, we may also update it using a so-called augmented assignment statement (i.e., not operator), as introduced with PEP 203
: The currently mapped object is implicitly inserted as the first operand on the right-hand side.
variable *= 4 # same as variable = variable * 4
variable
80
variable //= 2 # same as variable = variable // 2; "//" to retain the integer type
variable
40
variable += 2 # same as variable = variable + 2
variable
42
Variables are dereferenced (i.e., "deleted") with the
del
statement. This does not delete the object a variable references but merely removes the variable's name from the "global list of all names."
variable
42
del variable
If we refer to an unknown name, a runtime error occurs, namely a NameError
. The Name
in NameError
gives a hint why we choose the term name over identifier above: Python uses it more often in its error messages.
variable
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[13], line 1 ----> 1 variable NameError: name 'variable' is not defined
Some variables magically exist when a Python process is started or are added by Jupyter. We may safely ignore the former until Chapter 11 and the latter for good.
__name__
'__main__'
To see all defined names, the built-in function dir() is helpful.
dir()
['In', 'Out', 'Path', '_', '_10', '_11', '_14', '_2', '_4', '_6', '_8', '__', '___', '__builtin__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__session__', '__spec__', '_dh', '_i', '_i1', '_i10', '_i11', '_i12', '_i13', '_i14', '_i15', '_i2', '_i3', '_i4', '_i5', '_i6', '_i7', '_i8', '_i9', '_ih', '_ii', '_iii', '_oh', 'atexit', 'exit', 'get_ipython', 'history', 'history_path', 'open', 'os', 'quit', 'readline', 'state_home', 'write_history']
Phil Karlton famously noted during his time at Netscape :
"There are two hard problems in computer science: naming things and cache invalidation ... and off-by-one errors."
Variable names may contain upper and lower case letters, numbers, and underscores (i.e., _
) and be as long as we want them to be. However, they must not begin with a number. Also, they must not be any of Python's built-in keywords like
for
or if
.
Variable names should be chosen such that they do not need any more documentation and are self-explanatory. A widespread convention is to use so-called snake_case : Keep everything lowercase and use underscores to separate words.
See this link for a comparison of different naming conventions.
pi = 3.14
answer_to_everything = 42
my_name = "Alexander"
work_address = "WHU, Burgplatz 2, Vallendar"
PI = 3.14 # unless used as a "global" constant
answerToEverything = 42 # this is a style used in languages like Java
name = "Alexander" # name of what?
address@work = "WHU, Burgplatz 2, Vallendar"
Cell In[23], line 1 address@work = "WHU, Burgplatz 2, Vallendar" ^ SyntaxError: cannot assign to expression here. Maybe you meant '==' instead of '='?
If a variable name collides with a built-in name, we add a trailing underscore.
type_ = "student"
Variables with leading and trailing double underscores, referred to as dunder in Python jargon, are used for built-in functionalities and to implement object-oriented features as we see in Chapter 11 . We must not use this style for variables!
__name__
'__main__'
It is crucial to understand that several variables may reference the same object in memory. Not having this in mind may lead to many hard to track down bugs.
Let's make b
reference whatever object a
is referencing.
a = 42
b = a
b
42
For "simple" types like int
or float
this never causes troubles.
Let's "change the value" of a
. To be precise, let's create a new 87
object and make a
reference it.
a = 87
a
87
b
"is still the same" as before. To be precise, b
still references the same object as before.
b
42
However, if a variable references an object of a more "complex" type (e.g., list
), predicting the outcome of a code snippet may be unintuitive for a beginner.
x = [1, 2, 3]
type(x)
list
y = x
y
[1, 2, 3]
Let's change the first element of x
.
Chapter 7 discusses lists in more depth. For now, let's view a
list
object as some sort of container that holds an arbitrary number of references to other objects and treat the brackets []
attached to it as yet another operator, namely the indexing operator. So, x[0]
instructs Python to first follow the reference from the global list of all names to the x
object. Then, it follows the first reference it finds there to the 1
object we put in the list. The indexing operator must be an operator as we merely read the first element and do not change anything in memory permanently.
Python begins counting at 0. This is not the case for many other languages, for example, MATLAB , R
, or Stata
. To understand why this makes sense, see this short note by one of the all-time greats in computer science, the late Edsger Dijkstra
.
x[0]
1
To change the first entry in the list, we use the assignment statement =
again. Here, this does not create a new variable, nor overwrite an existing one, but only changes the object referenced as the first element in x
. As we only change parts of the x
object, we say that we mutate its state. To use the bag analogy from above, we keep the same bag but "flip" some of the 0s into 1s and some of the 1s into 0s.
x[0] = 99
x
[99, 2, 3]
The changes made to the object x
is referencing can also be seen through the y
variable!
y
[99, 2, 3]
The difference in behavior illustrated in this sub-section has to do with the fact that int
and float
objects are immutable types while list
objects are mutable.
In the first case, an object cannot be changed "in place" once it is created in memory. When we assigned 87
to the already existing a
, we did not change the 0s and 1s in the object a
referenced before the assignment but created a new int
object and made a
reference it while the b
variable is not affected.
In the second case, x[0] = 99
creates a new int
object 99
and merely changes the first reference in the x
list.
In general, the assignment statement creates a new name and makes it reference whatever object is on the right-hand side iff the left-hand side is a pure name (i.e., it contains no operators like the indexing operator in the example). Otherwise, it mutates an already existing object. And, we must always expect that the latter may have more than one variable referencing it.
Visualizing what is going on in memory with a tool like PythonTutor may be helpful for a beginner.
As we saw in the previous list
example, it is important to understand in what order Python executes the "commands" (= not an officially used term) we give it. In this last section of the chapter, we introduce a classification scheme and look at its implications.
An expression is any syntactically correct combination of variables and literals with operators that evaluates (i.e., "becomes") to an object. That object may already exist before the expression is parsed or created as a result thereof. The point is that after the expression is parsed, Python returns a reference to this object that may be used for further processing.
In simple words, anything that may be used on the right-hand side of an assignment statement without creating a SyntaxError
is an expression.
What we have said about individual operators before, namely that they have no permanent side effects in memory, actually belongs here, to begin with: The absence of any permanent side effects is the characteristic property of expressions, and all the code cells in the "(Arithmetic) Operators" section in the first part of this chapter are examples of expressions.
The simplest possible expressions contain only one variable or literal. The output below a code cell is Jupyter's way of returning the reference to the object to us!
Whereas a
evaluates to the existing 87
object, ...
a
87
... parsing the literal 42
creates a new int
object and returns a reference to it (Note: for optimization reasons, the CPython implementation may already have a 42
object in memory).
42
42
For sure, we need to include operators to achieve something useful. Here, Python takes the existing a
object, creates a new 42
object, creates the resulting 45
object, and returns a reference to that.
a - 42
45
The definition of an expression is recursive. So, the sub-expression a - 42
is combined with the literal 9
by the operator //
to form the full expression (a - 42) // 9
.
(a - 42) // 9
5
Below, the variable x
is combined with the literal 2
by the indexing operator []
. The resulting expression evaluates to the third element in the x
list.
x[2]
3
When not used as a delimiter, parentheses also constitute an operator, namely the call operator ()
. We saw this syntax before when we called built-in functions and methods.
sum(x)
104
A statement is any single command Python understands.
So, what we mean with statement is a strict superset of what we mean with expression: All expressions are statements, but some statements are not expressions. That implies that all code cells in the previous "Expressions" section are also examples of statements.
Unfortunately, many texts on Python use the two terms as mutually exclusive concepts by saying that a statement (always) changes the state of a program with a permanent side effect in memory (e.g., by creating a variable) whereas an expression does not. That is not an accurate way of distinguishing the two in all cases!
So, often, the term statement is used in the meaning of "a statement that is not an expression." We can identify such statements easily in Jupyter as the correspoding code cells have no output below them.
While many statements, for example =
and del
, indeed have permanent side effects, ...
a = 21 * 2
del a
... calling the built-in print() function does neither change the memory nor evaluate to an object (disregarding the
None
object explained in Chapter 2 ). We could view changing the computer's screen as a side effect but this is outside of Python's memory!
Also, the cell below has no output! It only looks like it does as Jupyter redirects whatever print() writes to the "screen" to below a cell. We see a difference to the expressions above in that there are no brackets
[...]
next to the output showing the execution count number.
print("Don't I change the state of the computer's display?")
Don't I change the state of the computer's display?
How many lines of code does the next cell constitute?
result = 21 * 2; print("The answer is:", result)
The answer is: 42
The answer depends on how we are counting. If we count the number of lines of written source code, the answer is one. This gives us the number of physical lines. On the contrary, if we count the number of statements separated by the ;
, the answer is two. This gives us the number of logical lines.
While physical lines are what we, the humans, see, logical lines are how computers read. To align the two ways, it is a best practice to not write more than one logical lines on a single physical one. So, from Python's point of view, the cells above and below are the same. More importantly, the one above is not somehow "magically" faster.
result = 21 * 2
print("The answer is:", result)
The answer is: 42
It is also possible to write a single logical line over several physical ones. The cell below is yet another physical representation of the same two logical lines. Any pair of delimiters, like (
and )
below, can be used to "format" the code in between with whitespace. The style guides mentioned before should still be taken into account (i.e., indent with 4 spaces).
result = 21 * 2
print(
"The answer is:",
result
)
The answer is: 42
Another situation, in which several physical lines are treated as a logical one, is with so-called compound statements . In contrast to simple statements like
=
and del
above, the purpose of compound statements is to group other statements.
We have seen two examples of compound statements already: for
and if
. They both consist of a header line ending with a :
and an indented (code) block spanning an arbitrary number of logical lines.
In the example, the first logical line spans the entire cell, the second logical line contains all physical lines except the first, and the third and fourth logical lines are just the third and fourth physical lines, respectively.
for number in [1, 2, 3]:
if number % 2 == 0:
double = 2 * number
print(number, "->", double)
2 -> 4
A comment is a physical line that is not a logical line.
We use the #
symbol to write comments in plain text right into the code. Anything after the #
until the end of the physical line is ignored by Python.
As a good practice, comments should not describe what happens. This should be evident by reading the code. Otherwise, it is most likely badly written code. Rather, comments should describe why something happens.
Comments may be added either at the end of a line of code, by convention separated with two spaces, or on a line on their own.
distance = 891 # in meters
elapsed_time = 93 # in seconds
# Calculate the speed in km/h.
speed = 3.6 * distance / elapsed_time
But let's think wisely if we need to use a comment. The second cell is a lot more Pythonic.
seconds = 365 * 24 * 60 * 60 # = seconds in the year
seconds_per_year = 365 * 24 * 60 * 60