Shawn Rhoads, Georgetown University (NSCI 526)
This tutorial has been adapted from the following resources on Jupyter Notebooks and Python:
We will use Jupyter Notebooks to interface with Python. Rather than keeping code scripts code and execution separate, jupyter integrates both together using the concept of cells. Two main types of cells are code cells and markdown cells. Cells pair "snippets" of code with the output obtained from running them and can contain plots/figure inline alongside code required to generate them
Code cells contain actual code that you want to run. You can specify a cell as a code cell using the pulldown menu in the toolbar in your Jupyter notebook. Otherwise, you can can hit esc and then y (denoted "esc, y") while a cell is selected to specify that it is a code cell. Note that you will have to hit enter after doing this to start editing it. If you want to execute the code in a code cell, hit "shift + enter." Note that code cells are executed in the order you execute them. That is to say, the ordering of the cells for which you hit "shift + enter" is the order in which the code is executed. If you did not explicitly execute a cell early in the document, its results are now known to the Python interpreter.
Markdown cells contain text. The text is written in markdown, a lightweight markup language. You can read about its syntax here. Note that you can also insert HTML into markdown cells, and this will be rendered properly. As you are typing the contents of these cells, the results appear as text. Hitting "shift + enter" renders the text in the formatting you specify. You can specify a cell as being a markdown cell in the Jupyter toolbar, or by hitting "esc, m" in the cell. Again, you have to hit enter after using the quick keys to bring the cell into edit mode.
In general, when you want to add a new cell, you can use the "Insert" pulldown menu from the Jupyter toolbar. The shortcut to insert a cell below is "esc, b" and to insert a cell above is "esc, a." Alternatively, you can execute a cell and automatically add a new one below it by hitting "alt + enter."
# This is a code cell
#Put some code here and get some output below!
x = 10
print(x)
10
This is a markdown cell!
Not just for code:
Compute the following:
$$\sum_{n=1}^{5}n$$
from IPython.display import display, Image
display(Image('https://raw.githubusercontent.com/Summer-MIND/mind_2018/master/tutorials/jupyter/demo.gif'))
<IPython.core.display.Image object>
!conda install rpy2
import rpy2.rinterface
%load_ext rpy2.ipython
%%R
y <- c(1,2,3,4,5)
yy <- c(1,2,4,8,16)
plot(y,
yy,
main='R plot in Python')
We will import x=10
from above, define a variable z<-10
in R, store their product into XZ
, and then output a variable XZ
%%R -i x -o XZ
z <- 10
XZ <- x * z
The cell below will return XZ
in Python! Cool, huh?!
print(XZ)
[1] 100
You can do this with other languages as well! e.g., MATLAB (because we like paying for licenses)
Use the type()
function to find the type for a value or variable
# Integer
a = 1
print(type(a))
# Float
b = 1.0
print(type(b))
# String
c = 'hello'
print(type(c))
# Boolean
d = True
print(type(d))
# None
e = None
print(type(e))
# Cast integer to string
print(type(str(a)))
<class 'int'> <class 'float'> <class 'str'> <class 'bool'> <class 'NoneType'> <class 'str'>
# Addition
a = 2 + 7
print(a)
# Subtraction
b = a - 5
print(b)
# Multiplication
print(b*2)
# Exponentiation
print(b**2)
# Modulo
print(4%9)
# Division
print(4/9)
9 4 8 16 4 0.4444444444444444
Some of the arithmetic operators also have meaning for strings. E.g. for string concatenation use + sign
String repetition: Use * sign with a number of repetitions
# Combine string
a = 'Hello'
b = 'World'
print(a + b)
# Repeat String
print(a*5)
HelloWorld HelloHelloHelloHelloHello
Perform logical comparison and return Boolean value
x == y # x is equal to y
x != y # x is not equal to y
x > y # x is greater than y
x < y # x is less than y
x >= y # x is greater than or equal to y
x <= y # x is less than or equal to y
# Works for string
a = 'hello'
b = 'world'
c = 'Hello'
print("a==b: " + str(a==b))
print("a==c: " + str(a==c))
print("a!=b: " + str(a!=b))
# Works for numeric
d = 5
e = 8
print("d < e: " + str(d < e))
a==b: False a==c: False a!=b: True d < e: True
Unlike most other languages, Python uses tab formatting rather than closing conditional statements (e.g., end)
Syntax:
if condition:
do something
Implicit conversion of the value to bool() happens if condition is of a different type than bool, thus all of the following should work:
if condition:
do_something
elif condition:
do_alternative1
else:
do_otherwise # often reserved to report an error
# after a long list of options
```
n = 1
if n:
print("n is non-0")
if n is None:
print("n is None")
if n is not None:
print("n is not None")
n is non-0 n is not None
for loop is probably the most popular loop construct in Python:
for target in sequence:
do_statements
string = "Python is going to make conducting research easier"
for c in string:
print(c)
P y t h o n i s g o i n g t o m a k e c o n d u c t i n g r e s e a r c h e a s i e r
It’s also possible to use a while loop to repeat statements while condition remains True:
while condition do:
do_statements
x = 0
end = 10
csum = 0
while x < end:
csum += x
print(x, csum)
x += 1
print("Exited with x==%d" % x )
0 0 1 1 2 3 3 6 4 10 5 15 6 21 7 28 8 36 9 45 Exited with x==10
A function is a named sequence of statements that performs a computation. You define the function by giving it a name, specify a sequence of statements, and optionally values to return. Later, you can “call” the function by name.
def make_upper_case(text):
return (text.upper())
The first line of the function definition is called the header; the rest is called the body.
The header has to end with a colon and the body has to be indented. It is a common practice to use 4 spaces for indentation, and to avoid mixing with tabs.
Function body in Python ends whenever statement begins at the original level of indentation. There is no end or fed or any other identify to signal the end of function. Indentation is part of the the language syntax in Python, making it more readable and less cluttered.
string = "Python is going to make conducting research easier"
print(make_upper_case(string))
PYTHON IS GOING TO MAKE CONDUCTING RESEARCH EASIER
There are 4 main types of builtin containers for storing data in Python:
In Python, a list is a mutable sequence of values. Mutable means that we can change separate entries within a list.
Each value in the list is an element or item
Elements can be any Python data type
Lists can mix data types
Lists are initialized with [] or list()
l = [1,2,3]
l[0]
nested = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
l[start:stop:stride]
a.insert(index,new element)
a.append(element to add at end)
len(a)
[a for a in l]
# Indexing and Slicing
a = ['lists','are','arrays']
print(a[0])
print(a[1:3])
# List methods
a.insert(2,'python')
a.append('.')
print(a)
print(len(a))
# List Comprehension
print([x.upper() for x in a])
lists ['are', 'arrays'] ['lists', 'are', 'python', 'arrays', '.'] 5 ['LISTS', 'ARE', 'PYTHON', 'ARRAYS', '.']
# Dictionaries
eng2sp = {}
eng2sp['one'] = 'uno'
print(eng2sp)
eng2sp = {'one': 'uno', 'two': 'dos', 'three': 'tres'}
print(eng2sp)
print(eng2sp.keys())
print(eng2sp.values())
{'one': 'uno'} {'one': 'uno', 'two': 'dos', 'three': 'tres'} dict_keys(['one', 'two', 'three']) dict_values(['uno', 'dos', 'tres'])
In Python, a tuple is an immutable sequence of values, meaning they can’t be changed
numbers = (1, 2, 3, 4)
print(numbers)
t2 = 1, 2
print(t2)
(1, 2, 3, 4) (1, 2)
In Python, a set is an efficient storage for “membership” checking
# Union
print({1, 2, 3, 'mom', 'dad'} | {2, 3, 10})
# Intersection
print({1, 2, 3, 'mom', 'dad'} & {2, 3, 10})
# Difference
print({1, 2, 3, 'mom', 'dad'} - {2, 3, 10})
{1, 2, 3, 'mom', 10, 'dad'} {2, 3} {1, 'mom', 'dad'}
A Module is a python file that contains a collection of related definitions. Python has hundreds of standard modules. These are organized into what is known as the Python Standard Library. You can also create and use your own modules. To use functionality from a module, you first have to import the entire module or parts of it into your namespace
To import the entire module:
python import module_name
You can also import a module using a specific name:
python import module_name as new_module_name
To import specific definitions (e.g. functions, variables, etc) from the module into your local namespace:
from module_name import name1, name2
import os
from glob import glob
To print the curent directory, you can use: os.path.abspath(os.path.curdir)
Let’s use glob, a pattern matching function, to list all of the ipynb files in the current folder.
data_file_list = glob(os.path.join(os.path.curdir,'*ipynb'))
print(data_file_list)
['.\\NSCI 526 Tutorial 1.1 (Intro to Python).ipynb', '.\\NSCI 526 Tutorial 1.2 (Introduction to Artificial Neural Networks).ipynb', '.\\NSCI 526 Tutorial 1.3 (Emotional Faces Classifier).ipynb']
This gives us a list of the files including the relative path from the current directory. What if we wanted just the filenames? There are several different ways to do this. First, we can use the the os.path.basename function. We loop over every file, grab the base file name and then append it to a new list.
file_list = []
for f in data_file_list:
file_list.append(os.path.basename(f))
print(file_list)
['NSCI 526 Tutorial 1.1 (Intro to Python).ipynb', 'NSCI 526 Tutorial 1.2 (Introduction to Artificial Neural Networks).ipynb', 'NSCI 526 Tutorial 1.3 (Emotional Faces Classifier).ipynb']
It is also sometimes even cleaner to do this as a list comprehension
[os.path.basename(x) for x in data_file_list]
['NSCI 526 Tutorial 1.1 (Intro to Python).ipynb', 'NSCI 526 Tutorial 1.2 (Introduction to Artificial Neural Networks).ipynb', 'NSCI 526 Tutorial 1.3 (Emotional Faces Classifier).ipynb']
NumPy is the fundamental package for scientific computing with Python.
import numpy as np
NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of non-negative integers. In NumPy dimensions are called axes.
NumPy’s array class is called ndarray
. It is also known by the alias array
. The more important attributes of an ndarray object are:
(n,m)
. The length of the shape
tuple is therefore the number of axes, ndim
.shape
.float64
has itemsize
8 (=64/8), while one of type complex32
has itemsize
4 (=32/8). It is equivalent to ndarray.dtype.itemsize
.a = np.arange(15) #array of numbers 0 to 14
print(a)
print(a.shape)
print(a.ndim)
print(a.dtype.name)
print(a.itemsize)
print(a.size)
print(type(a))
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14] (15,) 1 int32 4 15 <class 'numpy.ndarray'>
You can create an array from a regular Python list or tuple using the array function. The type of the resulting array is deduced from the type of the elements in the sequences.
A frequent error consists in calling array with multiple numeric arguments, rather than providing a single list of numbers as an argument.
a = np.array(1,2,3,4) # WRONG
a = np.array([1,2,3,4]) # RIGHT
b = np.array([6, 7, 8])
print(b)
print(type(b))
[6 7 8] <class 'numpy.ndarray'>
array
transforms sequences of sequences into two-dimensional arrays, sequences of sequences of sequences into three-dimensional arrays, and so on.
c = np.array([(1.5, 2 ,3), (4, 5, 6), (7.1, 7.2, 7.3)])
print(c)
[[1.5 2. 3. ] [4. 5. 6. ] [7.1 7.2 7.3]]
The function zeros
creates an array full of zeros, the function ones
creates an array full of ones, the function random.rand
creates an array of random floats from a uniform distribution over [0, 1], and the function empty
creates an array whose initial content is random and depends on the state of the memory. By default, the dtype of the created array is float64
.
np.zeros((3,4))
array([[0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.]])
np.ones((2,3,4), dtype=np.int16)
array([[[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]], [[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]]], dtype=int16)
np.random.rand(3,2)
array([[0.0688618 , 0.3021982 ], [0.83227422, 0.58830925], [0.67215902, 0.04218746]])
np.empty((2,3)) # uninitialized, output may vary
array([[0.0688618 , 0.3021982 , 0.83227422], [0.58830925, 0.67215902, 0.04218746]])
To create sequences of numbers, NumPy provides a function analogous to range
that returns arrays instead of lists.
np.arange( 10, 30, 5 ) # array from 10 to 30 in increments of 5
array([10, 15, 20, 25])
Three main functions include:
ravel()
flattens an arrayreshape()
changes the shape of arraystranspose()
transposes the arrayexample = np.floor(10*np.random.random((4,4)))
example
array([[9., 3., 2., 9.], [9., 9., 9., 5.], [8., 2., 2., 3.], [6., 8., 8., 5.]])
example.ravel() # returns the array, flattened
array([9., 3., 2., 9., 9., 9., 9., 5., 8., 2., 2., 3., 6., 8., 8., 5.])
example.reshape(2,8) # returns the array with a modified shape 2x8
array([[9., 3., 2., 9., 9., 9., 9., 5.], [8., 2., 2., 3., 6., 8., 8., 5.]])
example.transpose()
array([[9., 9., 8., 6.], [3., 9., 2., 8.], [2., 9., 2., 8.], [9., 5., 3., 5.]])
The reshape
function returns its argument with a modified shape, whereas the resize
method modifies the array itself:
example.shape
(4, 4)
example.resize(2,8)
example.shape
(2, 8)
If a dimension is given as -1 in a reshaping operation, the other dimensions are automatically calculated:
example.reshape(4,-1)
array([[9., 3., 2., 9.], [9., 9., 9., 5.], [8., 2., 2., 3.], [6., 8., 8., 5.]])
NumPy package contains numpy.linalg
module that provides all the functionality required for linear algebra. Some of the important functions in this module are:
dot
: Dot product of the two arraysvdot
: Dot product of the two vectorsinner
: Inner product of the two arrayssolve
: Solves the linear matrix equationinv
: Finds the multiplicative inverse of the matrixM = np.array([[3, 0 ,2],
[2, 0, -2],
[0, 1, 1]])
v = np.array([1, 2, 3])
print(M.dot(v))
[ 9 -4 5]
Other functions:
multiply()
: Matrix product of the two arrayseye()
: Returns a 2-D array with ones on the diagonal and zeros elsewherelinalg.eig()
: Returns the eigenvalues and eigenvectors of the arrayprint(np.eye(3))
[[1. 0. 0.] [0. 1. 0.] [0. 0. 1.]]
!jupyter nbconvert "Intro to Python.ipynb" --to slides --post serve