Jupyter Notebooks & Python Tutorial¶

Shawn Rhoads, Georgetown University (NSCI 526)

This tutorial has been adapted from the following resources on Jupyter Notebooks and Python:

Luke Chang's DartBrains
Adam Pritchard's Markdown Cheatsheet
Eshin Jolly's Jupyter Tutorial
NumPy Quickstart Tutorial

Interactive Coding!¶

We will use Jupyter Notebooks to interface with Python. Rather than keeping code scripts code and execution separate, jupyter integrates both together using the concept of cells. Two main types of cells are code cells and markdown cells. Cells pair "snippets" of code with the output obtained from running them and can contain plots/figure inline alongside code required to generate them

Code cells contain actual code that you want to run. You can specify a cell as a code cell using the pulldown menu in the toolbar in your Jupyter notebook. Otherwise, you can can hit esc and then y (denoted "esc, y") while a cell is selected to specify that it is a code cell. Note that you will have to hit enter after doing this to start editing it. If you want to execute the code in a code cell, hit "shift + enter." Note that code cells are executed in the order you execute them. That is to say, the ordering of the cells for which you hit "shift + enter" is the order in which the code is executed. If you did not explicitly execute a cell early in the document, its results are now known to the Python interpreter.

Markdown cells contain text. The text is written in markdown, a lightweight markup language. You can read about its syntax here. Note that you can also insert HTML into markdown cells, and this will be rendered properly. As you are typing the contents of these cells, the results appear as text. Hitting "shift + enter" renders the text in the formatting you specify. You can specify a cell as being a markdown cell in the Jupyter toolbar, or by hitting "esc, m" in the cell. Again, you have to hit enter after using the quick keys to bring the cell into edit mode.

In general, when you want to add a new cell, you can use the "Insert" pulldown menu from the Jupyter toolbar. The shortcut to insert a cell below is "esc, b" and to insert a cell above is "esc, a." Alternatively, you can execute a cell and automatically add a new one below it by hitting "alt + enter."

In [1]:

# This is a code cell
#Put some code here and get some output below!
x = 10
print(x)

This is a markdown cell!

Notebooks of the Future¶

Not just for code:

Markdown, HTML, LateX integration
Slide shows (like this one!)
Keep your notes alongside your analysis routines
Embed images, videos, anything (it's all just HTML + javascript)

HTML Integration:¶

Var1	Var2
Cell 1	Cell 2
Cell 3	Cell 4

LaTeX Integration:¶

Compute the following:
$$\sum_{n=1}^{5}n$$

Inserting images from web:¶

In [2]:

from IPython.display import display, Image

display(Image('https://raw.githubusercontent.com/Summer-MIND/mind_2018/master/tutorials/jupyter/demo.gif'))

<IPython.core.display.Image object>

Multilingual!¶

Python (multiple versions together)
R
Matlab
Javascript
etc (https://github.com/jupyter/jupyter/wiki/Jupyter-kernels)

Coding with R:¶

!conda install rpy2

In [3]:

import rpy2.rinterface
%load_ext rpy2.ipython

In [4]:

%%R

y <- c(1,2,3,4,5)
yy <- c(1,2,4,8,16)

plot(y,
     yy,
     main='R plot in Python')

We will import x=10 from above, define a variable z<-10 in R, store their product into XZ, and then output a variable XZ

In [5]:

%%R -i x -o XZ

z <- 10
XZ <- x * z

The cell below will return XZ in Python! Cool, huh?!

In [6]:

print(XZ)

[1] 100

You can do this with other languages as well! e.g., MATLAB (because we like paying for licenses)

Save notebooks in other formats and put them online¶

Github automatically renders jupyter notebooks
Online notebook viewer
Output to PDF, HTML (including javascript)
Jupyterhub: code with friends!

Customize your notebook experience with extensions¶

Table of Contents
Execution time/Profiling
Scratch Space
Code/Section folding
Look and feel (CSS+Javascript)
Other notebook extensions

For more...¶

Types of Variables¶

Numeric types:
- int, float, long, complex
string
boolean
- True / False

Use the type() function to find the type for a value or variable

In [7]:

# Integer
a = 1
print(type(a))

# Float
b = 1.0
print(type(b))

# String
c = 'hello'
print(type(c))

# Boolean
d = True
print(type(d))

# None
e = None
print(type(e))

# Cast integer to string
print(type(str(a)))

<class 'int'>
<class 'float'>
<class 'str'>
<class 'bool'>
<class 'NoneType'>
<class 'str'>

Math Operators¶

+, -, *, and /
Exponentiation **
Modulo %

In [8]:

# Addition
a = 2 + 7
print(a)

# Subtraction
b = a - 5
print(b)

# Multiplication
print(b*2)

# Exponentiation
print(b**2)

# Modulo
print(4%9)

# Division
print(4/9)

9
4
8
16
4
0.4444444444444444

String Operators¶

Some of the arithmetic operators also have meaning for strings. E.g. for string concatenation use + sign

String repetition: Use * sign with a number of repetitions

In [9]:

# Combine string
a = 'Hello'
b = 'World'
print(a + b)

# Repeat String
print(a*5)

HelloWorld
HelloHelloHelloHelloHello

Logical Operators¶

Perform logical comparison and return Boolean value

x == y # x is equal to y
x != y # x is not equal to y
x > y # x is greater than y
x < y # x is less than y
x >= y # x is greater than or equal to y 
x <= y # x is less than or equal to y

In [10]:

# Works for string
a = 'hello'
b = 'world'
c = 'Hello'
print("a==b: " + str(a==b))
print("a==c: " + str(a==c))
print("a!=b: " + str(a!=b))

# Works for numeric
d = 5
e = 8
print("d < e: " + str(d < e))

a==b: False
a==c: False
a!=b: True
d < e: True

Conditional Logic (if…)¶

Unlike most other languages, Python uses tab formatting rather than closing conditional statements (e.g., end)

Syntax:

if condition: 
    do something

Implicit conversion of the value to bool() happens if condition is of a different type than bool, thus all of the following should work:

if condition:
    do_something
elif condition:
    do_alternative1
else:
    do_otherwise # often reserved to report an error
                 # after a long list of options
    ```

In [11]:

n = 1

if n:
    print("n is non-0")

if n is None:
    print("n is None")
    
if n is not None:
    print("n is not None")

n is non-0
n is not None

Loops¶

for loop is probably the most popular loop construct in Python:

for target in sequence:
    do_statements

In [12]:

string = "Python is going to make conducting research easier"
for c in string:
    print(c)

P
y
t
h
o
n
 
i
s
 
g
o
i
n
g
 
t
o
 
m
a
k
e
 
c
o
n
d
u
c
t
i
n
g
 
r
e
s
e
a
r
c
h
 
e
a
s
i
e
r

It’s also possible to use a while loop to repeat statements while condition remains True:

while condition do:
    do_statements

In [13]:

x = 0
end = 10

csum = 0
while x < end:
    csum += x
    print(x, csum)
    x += 1
print("Exited with x==%d" % x )

0 0
1 1
2 3
3 6
4 10
5 15
6 21
7 28
8 36
9 45
Exited with x==10

Functions¶

A function is a named sequence of statements that performs a computation. You define the function by giving it a name, specify a sequence of statements, and optionally values to return. Later, you can “call” the function by name.

In [14]:

def make_upper_case(text):
    return (text.upper())

The expression in the parenthesis is the argument.
It is common to say that a function "takes" an argument and "returns" a result.
The result is called the return value.

The first line of the function definition is called the header; the rest is called the body.

The header has to end with a colon and the body has to be indented. It is a common practice to use 4 spaces for indentation, and to avoid mixing with tabs.

Function body in Python ends whenever statement begins at the original level of indentation. There is no end or fed or any other identify to signal the end of function. Indentation is part of the the language syntax in Python, making it more readable and less cluttered.

In [15]:

string = "Python is going to make conducting research easier"

print(make_upper_case(string))

PYTHON IS GOING TO MAKE CONDUCTING RESEARCH EASIER

Python Containers¶

There are 4 main types of builtin containers for storing data in Python:

list
tuple
dict
set

Lists¶

In Python, a list is a mutable sequence of values. Mutable means that we can change separate entries within a list.

Each value in the list is an element or item
Elements can be any Python data type
Lists can mix data types
Lists are initialized with [] or list()

l = [1,2,3]

Elements within a list are indexed (starting with 0)

l[0]

Elements can be nested lists

nested = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

Lists can be sliced.

l[start:stop:stride]

Like all python containers, lists have many useful methods that can be applied

a.insert(index,new element)
a.append(element to add at end)
len(a)

List comprehension is a very powerful technique allowing for efficient construction of new lists.

[a for a in l]

In [16]:

# Indexing and Slicing
a = ['lists','are','arrays']
print(a[0])
print(a[1:3])

# List methods
a.insert(2,'python')
a.append('.')
print(a)
print(len(a))

# List Comprehension
print([x.upper() for x in a])

lists
['are', 'arrays']
['lists', 'are', 'python', 'arrays', '.']
5
['LISTS', 'ARE', 'PYTHON', 'ARRAYS', '.']

Dictionaries¶

In Python, a dictionary (or dict) is mapping between a set of indices (keys) and a set of values
The items in a dictionary are key-value pairs
Keys can be any Python data type
Dictionaries are unordered

In [17]:

# Dictionaries
eng2sp = {}
eng2sp['one'] = 'uno'
print(eng2sp)

eng2sp = {'one': 'uno', 'two': 'dos', 'three': 'tres'}
print(eng2sp)

print(eng2sp.keys())
print(eng2sp.values())

{'one': 'uno'}
{'one': 'uno', 'two': 'dos', 'three': 'tres'}
dict_keys(['one', 'two', 'three'])
dict_values(['uno', 'dos', 'tres'])

Tuples¶

In Python, a tuple is an immutable sequence of values, meaning they can’t be changed

Each value in the tuple is an element or item
Elements can be any Python data type
Tuples can mix data types
Elements can be nested tuples
Essentially tuples are immutable lists

In [18]:

numbers = (1, 2, 3, 4)
print(numbers)

t2 = 1, 2
print(t2)

(1, 2, 3, 4)
(1, 2)

Sets¶

In Python, a set is an efficient storage for “membership” checking

set is like a dict but only with keys and without values
a set can also perform set operations (e.g., union intersection)

In [19]:

# Union
print({1, 2, 3, 'mom', 'dad'} | {2, 3, 10})

# Intersection
print({1, 2, 3, 'mom', 'dad'} & {2, 3, 10})

# Difference
print({1, 2, 3, 'mom', 'dad'} - {2, 3, 10})

{1, 2, 3, 'mom', 10, 'dad'}
{2, 3}
{1, 'mom', 'dad'}

Modules¶

A Module is a python file that contains a collection of related definitions. Python has hundreds of standard modules. These are organized into what is known as the Python Standard Library. You can also create and use your own modules. To use functionality from a module, you first have to import the entire module or parts of it into your namespace

To import the entire module: python import module_name

You can also import a module using a specific name: python import module_name as new_module_name

To import specific definitions (e.g. functions, variables, etc) from the module into your local namespace: from module_name import name1, name2

os and glob¶

In [20]:

import os
from glob import glob

To print the curent directory, you can use: os.path.abspath(os.path.curdir)

Let’s use glob, a pattern matching function, to list all of the ipynb files in the current folder.

In [21]:

data_file_list = glob(os.path.join(os.path.curdir,'*ipynb'))
print(data_file_list)

['.\\NSCI 526 Tutorial 1.1 (Intro to Python).ipynb', '.\\NSCI 526 Tutorial 1.2 (Introduction to Artificial Neural Networks).ipynb', '.\\NSCI 526 Tutorial 1.3 (Emotional Faces Classifier).ipynb']

This gives us a list of the files including the relative path from the current directory. What if we wanted just the filenames? There are several different ways to do this. First, we can use the the os.path.basename function. We loop over every file, grab the base file name and then append it to a new list.

In [22]:

file_list = []
for f in data_file_list:
    file_list.append(os.path.basename(f))

print(file_list)

['NSCI 526 Tutorial 1.1 (Intro to Python).ipynb', 'NSCI 526 Tutorial 1.2 (Introduction to Artificial Neural Networks).ipynb', 'NSCI 526 Tutorial 1.3 (Emotional Faces Classifier).ipynb']

It is also sometimes even cleaner to do this as a list comprehension

In [23]:

[os.path.basename(x) for x in data_file_list]

Out[23]:

['NSCI 526 Tutorial 1.1 (Intro to Python).ipynb',
 'NSCI 526 Tutorial 1.2 (Introduction to Artificial Neural Networks).ipynb',
 'NSCI 526 Tutorial 1.3 (Emotional Faces Classifier).ipynb']

NumPy¶

NumPy is the fundamental package for scientific computing with Python.

In [24]:

import numpy as np

NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of non-negative integers. In NumPy dimensions are called axes.

NumPy’s array class is called ndarray. It is also known by the alias array. The more important attributes of an ndarray object are:

ndarray.ndim: the number of axes (dimensions) of the array.
ndarray.shape: the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows and m columns, shape will be (n,m). The length of the shape tuple is therefore the number of axes, ndim.
ndarray.size: the total number of elements of the array. This is equal to the product of the elements of shape.
ndarray.dtype: an object describing the type of the elements in the array. One can create or specify dtype’s using standard Python types. Additionally NumPy provides types of its own. numpy.int32, numpy.int16, and numpy.float64 are some examples.
ndarray.itemsize: the size in bytes of each element of the array. For example, an array of elements of type float64 has itemsize 8 (=64/8), while one of type complex32 has itemsize 4 (=32/8). It is equivalent to ndarray.dtype.itemsize.
ndarray.data: the buffer containing the actual elements of the array. Normally, we won’t need to use this attribute because we will access the elements in an array using indexing facilities.

In [25]:

a = np.arange(15) #array of numbers 0 to 14

print(a)
print(a.shape)
print(a.ndim)
print(a.dtype.name)
print(a.itemsize)
print(a.size)
print(type(a))

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]
(15,)
1
int32
4
15
<class 'numpy.ndarray'>

Creating arrays¶

You can create an array from a regular Python list or tuple using the array function. The type of the resulting array is deduced from the type of the elements in the sequences.

A frequent error consists in calling array with multiple numeric arguments, rather than providing a single list of numbers as an argument.

a = np.array(1,2,3,4)    # WRONG
a = np.array([1,2,3,4])  # RIGHT

In [26]:

b = np.array([6, 7, 8])
print(b)
print(type(b))

[6 7 8]
<class 'numpy.ndarray'>

array transforms sequences of sequences into two-dimensional arrays, sequences of sequences of sequences into three-dimensional arrays, and so on.

In [27]:

c = np.array([(1.5, 2 ,3), (4, 5, 6), (7.1, 7.2, 7.3)])
print(c)

[[1.5 2.  3. ]
 [4.  5.  6. ]
 [7.1 7.2 7.3]]

The function zeros creates an array full of zeros, the function ones creates an array full of ones, the function random.rand creates an array of random floats from a uniform distribution over [0, 1], and the function empty creates an array whose initial content is random and depends on the state of the memory. By default, the dtype of the created array is float64.

In [28]:

np.zeros((3,4))

Out[28]:

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [29]:

np.ones((2,3,4), dtype=np.int16)

Out[29]:

array([[[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]],

       [[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]]], dtype=int16)

In [30]:

np.random.rand(3,2)

Out[30]:

array([[0.0688618 , 0.3021982 ],
       [0.83227422, 0.58830925],
       [0.67215902, 0.04218746]])

In [31]:

np.empty((2,3)) # uninitialized, output may vary

Out[31]:

array([[0.0688618 , 0.3021982 , 0.83227422],
       [0.58830925, 0.67215902, 0.04218746]])

To create sequences of numbers, NumPy provides a function analogous to range that returns arrays instead of lists.

In [32]:

np.arange( 10, 30, 5 ) # array from 10 to 30 in increments of 5

Out[32]:

array([10, 15, 20, 25])

Shape Manipulation¶

Three main functions include:

ravel() flattens an array
reshape() changes the shape of arrays
transpose() transposes the array

In [33]:

example = np.floor(10*np.random.random((4,4)))
example

Out[33]:

array([[9., 3., 2., 9.],
       [9., 9., 9., 5.],
       [8., 2., 2., 3.],
       [6., 8., 8., 5.]])

In [34]:

example.ravel()  # returns the array, flattened

Out[34]:

array([9., 3., 2., 9., 9., 9., 9., 5., 8., 2., 2., 3., 6., 8., 8., 5.])

In [35]:

example.reshape(2,8) # returns the array with a modified shape 2x8

Out[35]:

array([[9., 3., 2., 9., 9., 9., 9., 5.],
       [8., 2., 2., 3., 6., 8., 8., 5.]])

In [36]:

example.transpose()

Out[36]:

array([[9., 9., 8., 6.],
       [3., 9., 2., 8.],
       [2., 9., 2., 8.],
       [9., 5., 3., 5.]])

The reshape function returns its argument with a modified shape, whereas the resize method modifies the array itself:

In [37]:

example.shape

Out[37]:

(4, 4)

In [38]:

example.resize(2,8)
example.shape

Out[38]:

(2, 8)

If a dimension is given as -1 in a reshaping operation, the other dimensions are automatically calculated:

In [39]:

example.reshape(4,-1)

Out[39]:

array([[9., 3., 2., 9.],
       [9., 9., 9., 5.],
       [8., 2., 2., 3.],
       [6., 8., 8., 5.]])

Linear Algebra¶

NumPy package contains numpy.linalg module that provides all the functionality required for linear algebra. Some of the important functions in this module are:

dot: Dot product of the two arrays
vdot: Dot product of the two vectors
inner: Inner product of the two arrays
solve: Solves the linear matrix equation
inv: Finds the multiplicative inverse of the matrix

In [40]:

M = np.array([[3, 0 ,2],
              [2, 0, -2],
              [0, 1, 1]])

v = np.array([1, 2, 3])

print(M.dot(v))

[ 9 -4  5]

Other functions:

multiply(): Matrix product of the two arrays
eye(): Returns a 2-D array with ones on the diagonal and zeros elsewhere
linalg.eig(): Returns the eigenvalues and eigenvectors of the array

In [41]:

print(np.eye(3))

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]

Throw your notebook into slides:¶

!jupyter nbconvert "Intro to Python.ipynb" --to slides --post serve