#!/usr/bin/env python # coding: utf-8 # ## A crash course on Python # > **This is a sample chapter from [Learning IPython for Interactive Computing and Data Visualization, second edition](http://ipython-books.github.io/minibook/).** # If you don't know Python, read this section to learn the fundamentals. Python is a very accessible language and is even taught to school children. If you have ever programmed, it will only take you a few minutes to learn the basics. # ### Hello world # Open a new notebook and type the following in the first cell: # In[1]: print("Hello world!") # > TIP (Prompt string): Note that the convention chosen in this book is to show Python code (also called the `input`) prefixed with `In [x]: ` (which shouldn't be typed). This is the standard IPython prompt. Here, you should just type `print("Hello world!")` and then press `Shift`-`Enter`. # Congratulations! You are now a Python programmer. # ### Variables # Let's use Python as a calculator. # In[2]: 2 * 2 # Here, `2 * 2` is an _expression statement_. This operation is performed, the result is returned, and IPython displays it in the notebook cell's output. # > TIP (Division): In Python 3, `3 / 2` returns `1.5` (floating-point division), whereas it returns `1` in Python 2 (integer division). This can be source of errors when porting Python 2 code to Python 3. It is recommended to always use the explicit `3.0 / 2.0` for floating-point division (by using floating-point numbers) and `3 // 2` for integer division. Both syntaxes work in Python 2 and Python 3. See http://python3porting.com/differences.html#integer-division for more details. # Other built-in mathematical operators include `+`, `-`, `**` for the exponentiation, and others. You will find more details at https://docs.python.org/3/reference/expressions.html#the-power-operator. # **Variables** form a fundamental concept of any programming language. A variable has a name and a value. Here is how to create a new variable in Python: # In[3]: a = 2 # And here is how to use an existing variable: # In[4]: a * 3 # Several variables can be defined at once (this is called **unpacking**): # In[5]: a, b = 2, 6 # There are different types of variables. Here, we have used a number (more precisely, an **integer**). Other important types include **floating-point numbers** to represent real numbers, **strings** to represent text, and **booleans** to represent `True/False` values. Here are a few examples: # In[6]: somefloat = 3.1415 sometext = 'pi is about' # You can also use double quotes. print(sometext, somefloat) # Display several variables. # Note how we used the `#` character to write **comments**. Whereas Python discards the comments completely, adding comments in the code is important when the code is to be read by other humans (including yourself in the future). # ### String escaping # String escaping refers to the ability to insert special characters in a string. For example, how can you insert `'` and `"`, given that these characters are used to delimit a string in Python code? The backslash `\` is the go-to escape character in Python (and in many other languages too). Here are a few examples: # In[7]: print("Hello \"world\"") print("A list:\n* item 1\n* item 2") print("C:\\path\\on\\windows") print(r"C:\path\on\windows") # The special character `\n` is the **new line** (or line feed) character. To insert a backslash, you need to escape it, which explains why it needs to be doubled as `\\`. # You can also disable escaping by using **raw literals** with a `r` prefix before the string, like in the last example above. In this case, backslashes are considered as normal characters. # This is convenient when writing Windows paths, since Windows uses backslash separators instead of forward slashes like on Unix systems. **A very common error on Windows is forgetting to escape backslashes in paths**: writing `"C:\path"` may lead to subtle errors. # You will find the list of special characters in Python at https://docs.python.org/3.4/reference/lexical_analysis.html#string-and-bytes-literals # ### Lists # A list contains a sequence of items. You can concisely instruct Python to perform repeated actions on the elements of a list. Let's first create a list of numbers: # In[8]: items = [1, 3, 0, 4, 1] # Note the syntax we used to create the list: square brackets `[]`, and commas `,` to separate the items. # The *built-in* function `len()` returns the number of elements in a list: # In[9]: len(items) # > INFO (Built-in functions): Python comes with a set of built-in functions, including `print()`, `len()`, `max()`, functional routines like `filter()` and `map()`, and container-related routines like `all()`, `any()`, `range()` and `sorted()`. You will find the full list of built-in functions at https://docs.python.org/3.4/library/functions.html. # Now, let's compute the sum of all elements in the list. Python provides a _built-in_ function for this: # In[10]: sum(items) # We can also access individual elements in the list, using the following syntax: # In[11]: items[0] # In[12]: items[-1] # Note that indexing starts at `0` in Python: the first element of the list is indexed by `0`, the second by `1`, and so on. Also, `-1` refers to the last element, `-2`, to the penultimate element, and so on. # The same syntax can be used to alter elements in the list: # In[13]: items[1] = 9 items # We can access sublists with the following syntax: # In[14]: items[1:3] # Here, `1:3` represents a **slice** going from element `1` _included_ (this is the second element of the list) to element `3` _excluded_. Thus, we get a sublist with the second and third element of the original list. The first-included/last-excluded asymmetry leads to an intuitive treatment of overlaps between consecutive slices. Also, note that a sublist refers to a dynamic *view* of the original list, not a copy; changing elements in the sublist automatically changes them in the original list. # Python provides several other types of containers: # **Tuples** are immutable and contain a fixed number of elements: # In[15]: my_tuple = (1, 2, 3) my_tuple[1] # **Dictionaries** contain key-value pairs. They are extremely useful and common: # In[16]: my_dict = {'a': 1, 'b': 2, 'c': 3} print('a:', my_dict['a']) # In[17]: print(my_dict.keys()) # There is no notion of order in a dictionary. However, the native **collections** module provides an `OrderedDict` structure that keeps the insertion order (see https://docs.python.org/3.4/library/collections.html). # **Sets**, like mathematical sets, contain distinct elements: # In[18]: my_set = set([1, 2, 3, 2, 1]) my_set # > INFO (Mutable and immutable objects): A Python object is **mutable** if its value can change after it has been created. Otherwise, it is **immutable**. For example, a string is immutable; to change it, a new string needs to be created. A list, a dictionary, or a set is mutable; elements can be added or removed. By contrast, a tuple is immutable, and it is not possible to change the elements it contains without recreating the tuple. See https://docs.python.org/3.4/reference/datamodel.html for more details. # ### Loops # We can run through all elements of a list using a `for` loop: # In[19]: for item in items: print(item) # There are several things to note here: # * The `for item in items` syntax means that a temporary variable named `item` is created at every iteration. This variable contains the value of every item in the list, one at a time. # * Note the colon `:` at the end of the `for` statement. Forgetting it will lead to a syntax error! # * The statement `print(item)` will be executed for all items in the list. # * Note the four spaces before `print`: this is called the **indentation**. You will find more details about indentation in the next subsection. # Python supports a concise syntax to perform a given operation on all elements of a list: # In[20]: squares = [item * item for item in items] squares # This is called a **list comprehension**. A new list is created here; it contains the squares of all numbers in the list. This concise syntax leads to highly readable and *Pythonic* code. # ### Indentation # Indentation refers to the spaces that may appear at the beginning of some lines of code. This is a particular aspect of Python's syntax. # In most programming languages, indentation is optional and is generally used to make the code visually clearer. But in Python, indentation also has a syntactic meaning. Particular indentation rules need to be followed for Python code to be correct. # In general, there are two ways to indent some text: by inserting a *tab character* (also referred as `\t`), or by inserting a number of spaces (typically, four). It is recommended to use spaces instead of tab characters. Your text editor should be configured such that the *Tabular* key on the keyboard inserts four spaces instead of a tab character. # In the Notebook, indentation is automatically configured properly; so you shouldn't worry about this issue. The question only arises if you use another text editor for your Python code. # Finally, what is the meaning of indentation? In Python, indentation delimits coherent blocks of code, for example, the contents of a loop, a conditional branch, a function, and other objects. Where other languages such as C or JavaScript use curly braces to delimit such blocks, Python uses indentation. # ### Conditional branches # Sometimes, you need to perform different operations on your data depending on some condition. For example, let's display all even numbers in our list: # In[21]: for item in items: if item % 2 == 0: print(item) # Again, here are several things to note: # * An `if` statement is followed by a boolean expression. # * If `a` and `b` are two integers, the **modulo** operand `a % b` returns the remainder from the division of `a` by `b`. Here, `item % 2` is 0 for even numbers, and 1 for odd numbers. # * The equality is represented by a double equal sign `==` to avoid confusion with the _assignment_ operator `=` that we use when we create variables. # * Like with the `for` loop, the `if` statement ends with a colon `:`. # * The part of the code that is executed when the condition is satisfied follows the `if` statement. It is indented. Indentation is cumulative: since this `if` is inside a `for` loop, there are eight spaces before the `print(item)` statement. # Python supports a concise syntax to select all elements in a list that satisfy certain properties. Here is how to create a sublist with only even numbers: # In[22]: even = [item for item in items if item % 2 == 0] even # This is also a form of list comprehension. # ### Functions # Code is typically organized into functions. A **function** encapsulates part of your code. Functions allow you to reuse bits of functionality without copy-pasting the code. Here is a function that tells whether an integer number is even or not: # In[23]: def is_even(number): """Return whether an integer is even or not.""" return number % 2 == 0 # There are several things to note here: # * A function is defined with the `def` keyword. # * After `def` comes the function name. A general convention in Python is to only use lowercase characters, and separate words with an underscore `_`. A function name generally starts with a verb. # * The function name is followed by parentheses, with one or several variable names called the **arguments**. These are the **inputs** of the function. There is a single argument here, named `number`. # * No type is specified for the argument. This is because Python is **dynamically typed**; you could pass a variable of any type. This function would work fine with floating point numbers, for example (the modulo operation works with floating point numbers in addition to integers). # * The body of the function is indented (and note the colon `:` at the end of the `def` statement). # * There is a **docstring** wrapped by triple quotes `"""`. This is a particular form of comment that explains what the function does. It is not mandatory, but it is strongly recommended to write docstrings for the functions exposed to the user. # * The `return` keyword in the body of the function specifies the **output** of the function. Here, the output is a Boolean, obtained from the expression `number % 2 == 0`. It is possible to return several values; just use a comma to separate them (in this case, a tuple of Booleans would be returned). # Once a function is defined, it can be called like this: # In[24]: is_even(3) # In[25]: is_even(4) # Here, 3 and 4 are successively passed as arguments to the function. # ### Positional and keyword arguments # A Python function can accept an arbitrary number of arguments, called **positional arguments**. It can also accept optional named arguments, called **keyword arguments**. Here is an example: # In[26]: def remainder(number, divisor=2): return number % divisor # The second argument of this function, `divisor`, is optional. If it is not provided by the caller, it will default to the number 2, as show here: # In[27]: remainder(5) # There are two equivalent ways of specifying a keyword argument when calling a function: # In[28]: remainder(5, 3) # In[29]: remainder(5, divisor=3) # In the first case, `3` is understood as the second argument, `divisor`. In the second case, the name of the argument is given explicitly by the caller. This second syntax is clearer and less error-prone than the first one. # Functions can also accept arbitrary sets of positional and keyword arguments, using the following syntax: # In[30]: def f(*args, **kwargs): print("Positional arguments:", args) print("Keyword arguments:", kwargs) # In[31]: f(1, 2, c=3, d=4) # Inside the function, `args` is a tuple containing positional arguments, and `kwargs` is a dictionary containing keyword arguments. # ### Passage by assignment # When passing a parameter to a Python function, a *reference* to the object is actually passed (**passage by assignment**): # * If the passed object is mutable, it can be modified by the function. # * If the passed object is immutable, it cannot be modified by the function. # Here is an example: # In[32]: my_list = [1, 2] def add(some_list, value): some_list.append(value) add(my_list, 3) my_list # The function `add()` modifies an object defined outside it (in this case, the object `my_list`); we say this function has **side-effects**. A function with no side-effects is called a **pure function**: it doesn't modify anything in the outer context, and it deterministically returns the same result for any given set of inputs. Pure functions are to be preferred over functions with side-effects. # Knowing this can help you spot out subtle bugs. There are further related concepts that are useful to know, including function scopes, naming, binding, and more. Here are a couple of links: # * Passage by reference at https://docs.python.org/3/faq/programming.html#how-do-i-write-a-function-with-output-parameters-call-by-reference # * Naming, binding, and scope at https://docs.python.org/3.4/reference/executionmodel.html # ### Errors # Let's discuss about errors in Python. As you learn, you will inevitably come across errors and exceptions. The Python interpreter will most of the time tell you what the problem is, and where it occurred. It is important to understand the vocabulary used by Python so that you can more quickly find and correct your errors. # Let's see an example: # In[33]: def divide(a, b): return a / b # In[34]: divide(1, 0) # Here, we defined a `divide()` function, and called it to divide 1 by 0. Dividing a number by 0 is an error in Python. Here, a `ZeroDivisionError` **exception** was raised. An exception is a particular type of error that can be raised at any point in a program. It is propagated from the innards of the code up to the command that launched the code. It can be caught and processed at any point. You will find more details about exceptions at https://docs.python.org/3/tutorial/errors.html, and common exception types at https://docs.python.org/3/library/exceptions.html#bltin-exceptions. # The error message you see contains the **stack trace** and the exception's type and message. The stack trace shows all functions calls between the raised exception and the script calling point. # The top frame, indicated by the first arrow `---->`, shows the entry point of the code execution. Here, it is `divide(1, 0)` which was called directly in the Notebook. The error occurred while this function was called. # The next and last frame is indicated by the second arrow. It corresponds to line 2 in our function `divide(a, b)`. It is the last frame in the stack trace: this means that the error occurred there. # We will see later in this chapter how to **debug** such errors interactively in IPython and in the Jupyter Notebook. Knowing how to navigate up and down in the stack trace is critical when debugging complex Python code. # # # ### Object-oriented programming # **Object-oriented programming** (or OOP) is a relatively advanced topic. Although we won't use it much in this book, it is useful to know the basics. Also, mastering OOP is often essential when you start to have a large code base. # In Python, everything is an **object**. A number, a string, a function is an object. An object is an instance of a **type** (also known as *class*). An object has **attributes** and **methods**, as specified by its type. An attribute is a variable bound to an object, giving some information about it. A method is a function that applies to the object. # For example, the object `'hello'` is an instance of the built-in `str` type (string). The `type()` function returns the type of an object, as shown here: # In[35]: type('hello') # There are native types, like `str` or `int` (integer), and custom types, also called classes, that can be created by the user. # In IPython, you can discover the attributes and methods of any object with the dot syntax and tab completion. For example, typing `'hello'.u` and pressing *Tab* automatically shows us the existence of the `upper()` method: # In[36]: 'hello'.upper() # Here, `upper()` is a method available to all `str` objects; it returns an uppercase copy of a string. # A useful string method is `format()`. This simple and convenient templating system lets you generate strings dynamically: # In[37]: 'Hello {0:s}!'.format('Python') # The `{0:s}` syntax means "replace this with the first argument of `format()` which should be a string". The variable type after the colon is especially useful for numbers, where you can specify how to display the number (for example, `.3f` to display three decimals). The `0` makes it possible to replace a given value several times in a given string. You can also use a name instead of a position, for example `'Hello {name}!'.format(name='Python')`. # Some methods are prefixed with an underscore `_`; they are private and are generally not meant to be used directly. IPython's tab completion won't show you these private attributes and methods unless you explicitly type `_` before pressing *Tab*. # In practice, the most important thing to remember is that appending a dot `.` to any Python object and pressing *Tab* in IPython will show you a lot of functionality pertaining to that object. # ### Functional programming # Python is a multi-paradigm language; it notably supports imperative, object-oriented, and functional programming models. Python functions are objects and can be handled like other objects. In particular, they can be passed as arguments to other functions (also called **higher-order functions**). This the essence of **functional programming**. # **Decorators** provide a convenient syntax construct to define higher-order functions. Here is an example using the `is_even()` function from the previous **Functions** section: # In[38]: def show_output(func): def wrapped(*args, **kwargs): output = func(*args, **kwargs) print("The result is:", output) return wrapped # The `show_output()` function transforms an arbitrary function `func()` to a new function, named `wrapped()`, that displays the result of the function: # In[39]: f = show_output(is_even) f(3) # Equivalently, this higher-order function can also be used with a decorator: # In[40]: @show_output def square(x): return x * x # In[41]: square(3) # You can find more information about Python decorators at https://en.wikipedia.org/wiki/Python_syntax_and_semantics#Decorators and at http://thecodeship.com/patterns/guide-to-python-function-decorators/. # ### Python 2 and 3 # Let's finish this section with a few notes about Python 2 and Python 3 compatibility issues. # There are still some Python 2 code and libraries that are not compatible with Python 3. Therefore, it is sometimes useful to be aware of the differences between the two versions. One of the most obvious differences is that `print` is a statement in Python 2, whereas it is a function in Python 3. Therefore, `print "Hello"` (without parentheses) works in Python 2 but not in Python 3, while `print("Hello")` works in both Python 2 and Python 3. # There are several non-mutually exclusive options to write portable code that works with both versions: # * **futures**: a builtin module supporting backward-incompatible Python syntax # * **2to3**: a builtin Python module to port Python 2 code to Python 3 # * **six**: an external lightweight library for writing compatible code # Here are a few references: # * Official Python 2/3 wiki page at https://wiki.python.org/moin/Python2orPython3 # * *Porting to Python 3* book at http://python3porting.com/bookindex.html # * 2to3 at https://docs.python.org/3.4/library/2to3.html # * six at https://pythonhosted.org/six/ # * futures at https://docs.python.org/3.4/library/__future__.html # * The IPython Cookbook contains an in-depth recipe about choosing between Python 2 and 3, and how to support both. # # ### Going beyond the basics # You now know the fundamentals of Python, the bare minimum that you will need in this book. As you can imagine, there is much more to say about Python. # There are a few further basic concepts that are often useful and that we cannot cover here, unfortunately. You are highly encouraged to have a look at them in the references given at the end of this section: # * `range` and `enumerate` # * `pass`, `break`, and, `continue`, to be used in loops # * working with files # * creating and importing modules # * the Python standard library provides a wide range of functionality (OS, network, file systems, compression, mathematics, and more) # Here are some slightly more advanced concepts that you might find useful if you want to strengthen your Python skills: # * regular expressions for advanced string processing # * lambda functions for defining small anonymous functions # * generators for controlling custom loops # * exceptions for handling errors # * `with` statements for safely handling contexts # * advanced object-oriented programming # * metaprogramming for modifying Python code dynamically # * the `pickle` module for persisting Python objects on disk and exchanging them across a network # Finally, here are a few references: # * Getting started with Python: https://www.python.org/about/gettingstarted/ # * A Python tutorial: https://docs.python.org/3/tutorial/index.html # * The Python Standard Library: https://docs.python.org/3/library/index.html # * Interactive tutorial: http://www.learnpython.org/ # * Codecademy Python course: http://www.codecademy.com/tracks/python # * Language reference (expert level): https://docs.python.org/3/reference/index.html # * Python Cookbook, by David Beazley and Brian K. Jones, O'Reilly Media (advanced level, highly recommended if you want to become a Python expert).