#!/usr/bin/env python # coding: utf-8 # # Jupyter Notebooks & Python Tutorial # Shawn Rhoads, Georgetown University (NSCI 526) # # This tutorial has been adapted from the following resources on Jupyter Notebooks and Python: # - Luke Chang's [DartBrains](https://dartbrains.org) # - Adam Pritchard's [Markdown Cheatsheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) # - Eshin Jolly's [Jupyter Tutorial](https://github.com/Summer-MIND/mind_2018/tree/master/tutorials/jupyter) # - [NumPy Quickstart Tutorial](https://numpy.org/devdocs/user/quickstart.html) # ## Interactive Coding! # We will use Jupyter Notebooks to interface with Python. Rather than keeping code scripts code and execution separate, jupyter integrates both together using the concept of cells. Two main types of cells are **code cells** and **markdown cells**. Cells pair "snippets" of code with the output obtained from running them and can contain plots/figure inline alongside code required to generate them # # **Code cells** contain actual code that you want to run. You can specify a cell as a code cell using the pulldown menu in the toolbar in your Jupyter notebook. Otherwise, you can can hit esc and then y (denoted "esc, y") while a cell is selected to specify that it is a code cell. Note that you will have to hit enter after doing this to start editing it. If you want to execute the code in a code cell, hit "shift + enter." Note that code cells are executed in the order you execute them. That is to say, the ordering of the cells for which you hit "shift + enter" is the order in which the code is executed. If you did not explicitly execute a cell early in the document, its results are now known to the Python interpreter. # # **Markdown cells** contain text. The text is written in markdown, a lightweight markup language. You can read about its syntax [here](https://daringfireball.net/projects/markdown/syntax). Note that you can also insert HTML into markdown cells, and this will be rendered properly. As you are typing the contents of these cells, the results appear as text. Hitting "shift + enter" renders the text in the formatting you specify. You can specify a cell as being a markdown cell in the Jupyter toolbar, or by hitting "esc, m" in the cell. Again, you have to hit enter after using the quick keys to bring the cell into edit mode. # # In general, when you want to add a new cell, you can use the "Insert" pulldown menu from the Jupyter toolbar. The shortcut to insert a cell below is "esc, b" and to insert a cell above is "esc, a." Alternatively, you can execute a cell and automatically add a new one below it by hitting "alt + enter." # In[1]: # This is a code cell #Put some code here and get some output below! x = 10 print(x) # This is a *markdown* **cell**! # ## Notebooks of the Future # Not just for code: # - Markdown, HTML, LateX integration # - Slide shows (like this one!) # - Keep your notes alongside your analysis routines # - Embed images, videos, anything (it's all just HTML + javascript) # ### HTML Integration: # # # # # # # # # # # # # # # #
Var1
Var2
Cell 1
Cell 2
Cell 3
Cell 4
# ### LaTeX Integration: # Compute the following:
# $$\sum_{n=1}^{5}n$$ # ### Inserting images from web: # In[2]: from IPython.display import display, Image display(Image('https://raw.githubusercontent.com/Summer-MIND/mind_2018/master/tutorials/jupyter/demo.gif')) # ### Multilingual! # - Python (multiple versions together) # - R # - Matlab # - Javascript # - etc (https://github.com/jupyter/jupyter/wiki/Jupyter-kernels) # #### Coding with R: # `!conda install rpy2` # In[3]: import rpy2.rinterface get_ipython().run_line_magic('load_ext', 'rpy2.ipython') # In[4]: get_ipython().run_cell_magic('R', '', "\ny <- c(1,2,3,4,5)\nyy <- c(1,2,4,8,16)\n\nplot(y,\n yy,\n main='R plot in Python')\n") # We will import `x=10` from above, define a variable `z<-10` in R, store their product into `XZ`, and then output a variable `XZ` # In[5]: get_ipython().run_cell_magic('R', '-i x -o XZ', '\nz <- 10\nXZ <- x * z\n') # The cell below will return `XZ` in Python! Cool, huh?! # In[6]: print(XZ) # You can do this with other languages as well! e.g., MATLAB (because we like paying for licenses) # ### Save notebooks in other formats and put them online # - Github automatically renders jupyter notebooks # - [Online notebook viewer](http://nbviewer.jupyter.org/) # - Output to PDF, HTML (including javascript) # - [Jupyterhub](https://github.com/jupyterhub/jupyterhub): code with friends! # # ### Customize your notebook experience with extensions # - Table of Contents # - Execution time/Profiling # - Scratch Space # - Code/Section folding # - Look and feel (CSS+Javascript) # - Other notebook extensions # # ### For more... # - [Eshin Jolly - Intro to Jupyter Notebooks](https://github.com/Summer-MIND/mind_2018/tree/master/tutorials/jupyter) # - [Jake Vanderplas - Data Science Handbook](https://github.com/jakevdp/PythonDataScienceHandbook) # - [Tal Yarkoni - Intro to Data Science w/ Python](https://github.com/tyarkoni/SSI2016) # - [Yaroslav Halchenko - Intro to Programming for Psych/Neuro](https://github.com/dartmouth-pbs/psyc161) # - [Awesome notebooks](https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks) # - [Neuroimaging in Python](https://github.com/datacarpentry/python-neuroimaging-lesson) # - [Google Colaboratory](https://colab.research.google.com/notebooks/welcome.ipynb#recent=true) # --- # ## Types of Variables # - Numeric types: # - int, float, long, complex # - string # - boolean # - True / False # # # # Use the `type()` function to find the type for a value or variable # In[7]: # Integer a = 1 print(type(a)) # Float b = 1.0 print(type(b)) # String c = 'hello' print(type(c)) # Boolean d = True print(type(d)) # None e = None print(type(e)) # Cast integer to string print(type(str(a))) # ## Math Operators # - +, -, *, and / # - Exponentiation ** # - Modulo % # # In[8]: # Addition a = 2 + 7 print(a) # Subtraction b = a - 5 print(b) # Multiplication print(b*2) # Exponentiation print(b**2) # Modulo print(4%9) # Division print(4/9) # ## String Operators # Some of the arithmetic operators also have meaning for strings. E.g. for string concatenation use + sign # # String repetition: Use * sign with a number of repetitions # In[9]: # Combine string a = 'Hello' b = 'World' print(a + b) # Repeat String print(a*5) # ## Logical Operators # # Perform logical comparison and return Boolean value # # ``` # x == y # x is equal to y # x != y # x is not equal to y # x > y # x is greater than y # x < y # x is less than y # x >= y # x is greater than or equal to y # x <= y # x is less than or equal to y # ``` # In[10]: # Works for string a = 'hello' b = 'world' c = 'Hello' print("a==b: " + str(a==b)) print("a==c: " + str(a==c)) print("a!=b: " + str(a!=b)) # Works for numeric d = 5 e = 8 print("d < e: " + str(d < e)) # ## Conditional Logic (if…) # # Unlike most other languages, Python uses tab formatting rather than closing conditional statements (e.g., end) # # Syntax: # ``` # if condition: # do something # ``` # # Implicit conversion of the value to bool() happens if condition is of a different type than bool, thus all of the following should work: # ``` # if condition: # do_something # elif condition: # do_alternative1 # else: # do_otherwise # often reserved to report an error # # after a long list of options # ``` # In[11]: n = 1 if n: print("n is non-0") if n is None: print("n is None") if n is not None: print("n is not None") # ## Loops # **for** loop is probably the most popular loop construct in Python: # ``` # for target in sequence: # do_statements # ``` # In[12]: string = "Python is going to make conducting research easier" for c in string: print(c) # It’s also possible to use a **while** loop to repeat statements while condition remains True: # ``` # while condition do: # do_statements # ``` # In[13]: x = 0 end = 10 csum = 0 while x < end: csum += x print(x, csum) x += 1 print("Exited with x==%d" % x ) # ## Functions # # A **function** is a named sequence of statements that performs a computation. You define the function by giving it a name, specify a sequence of statements, and optionally values to return. Later, you can “call” the function by name. # In[14]: def make_upper_case(text): return (text.upper()) # - The expression in the parenthesis is the **argument**. # - It is common to say that a function **"takes" an argument** and **"returns" a result**. # - The result is called the **return value**. # # The first line of the function definition is called the header; the rest is called the body. # # The header has to end with a colon and the body has to be indented. It is a common practice to use 4 spaces for indentation, and to avoid mixing with tabs. # # Function body in Python ends whenever statement begins at the original level of indentation. There is no end or fed or any other identify to signal the end of function. Indentation is part of the the language syntax in Python, making it more readable and less cluttered. # In[15]: string = "Python is going to make conducting research easier" print(make_upper_case(string)) # ## Python Containers # # There are 4 main types of builtin containers for storing data in Python: # - list # - tuple # - dict # - set # ### Lists # In Python, a list is a mutable sequence of values. Mutable means that we can change separate entries within a list. # - Each value in the list is an element or item # - Elements can be any Python data type # - Lists can mix data types # # - Lists are initialized with [] or list() # ``` # l = [1,2,3] # ``` # # - Elements within a list are indexed (starting with 0) # ``` # l[0] # ``` # # - Elements can be nested lists # ``` # nested = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] # ``` # # - Lists can be sliced. # ``` # l[start:stop:stride] # ``` # # - Like all python containers, lists have many useful methods that can be applied # ``` # a.insert(index,new element) # a.append(element to add at end) # len(a) # ``` # # - List comprehension is a very powerful technique allowing for efficient construction of new lists. # ``` # [a for a in l] # ``` # In[16]: # Indexing and Slicing a = ['lists','are','arrays'] print(a[0]) print(a[1:3]) # List methods a.insert(2,'python') a.append('.') print(a) print(len(a)) # List Comprehension print([x.upper() for x in a]) # ### Dictionaries # - In Python, a dictionary (or dict) is mapping between a set of indices (keys) and a set of values # - The items in a dictionary are key-value pairs # - Keys can be any Python data type # - Dictionaries are unordered # In[17]: # Dictionaries eng2sp = {} eng2sp['one'] = 'uno' print(eng2sp) eng2sp = {'one': 'uno', 'two': 'dos', 'three': 'tres'} print(eng2sp) print(eng2sp.keys()) print(eng2sp.values()) # ### Tuples # # In Python, a tuple is an immutable sequence of values, meaning they can’t be changed # - Each value in the tuple is an element or item # - Elements can be any Python data type # - Tuples can mix data types # - Elements can be nested tuples # - Essentially tuples are immutable lists # In[18]: numbers = (1, 2, 3, 4) print(numbers) t2 = 1, 2 print(t2) # ### Sets # # In Python, a set is an efficient storage for “membership” checking # - set is like a dict but only with keys and without values # - a set can also perform set operations (e.g., union intersection) # In[19]: # Union print({1, 2, 3, 'mom', 'dad'} | {2, 3, 10}) # Intersection print({1, 2, 3, 'mom', 'dad'} & {2, 3, 10}) # Difference print({1, 2, 3, 'mom', 'dad'} - {2, 3, 10}) # ## Modules # # A Module is a python file that contains a collection of related definitions. Python has hundreds of standard modules. These are organized into what is known as the [Python Standard Library](http://docs.python.org/library/). You can also create and use your own modules. To use functionality from a module, you first have to import the entire module or parts of it into your namespace # # To import the entire module: # `python import module_name` # # You can also import a module using a specific name: # `python import module_name as new_module_name` # # To import specific definitions (e.g. functions, variables, etc) from the module into your local namespace: # `from module_name import name1, name2` # ### os and glob # In[20]: import os from glob import glob # To print the curent directory, you can use: `os.path.abspath(os.path.curdir)` # Let’s use glob, a pattern matching function, to list all of the ipynb files in the current folder. # In[21]: data_file_list = glob(os.path.join(os.path.curdir,'*ipynb')) print(data_file_list) # This gives us a list of the files including the relative path from the current directory. What if we wanted just the filenames? There are several different ways to do this. First, we can use the the os.path.basename function. We loop over every file, grab the base file name and then append it to a new list. # In[22]: file_list = [] for f in data_file_list: file_list.append(os.path.basename(f)) print(file_list) # It is also sometimes even cleaner to do this as a list comprehension # In[23]: [os.path.basename(x) for x in data_file_list] # ### NumPy # NumPy is the fundamental package for scientific computing with Python. # In[24]: import numpy as np # NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of non-negative integers. In NumPy dimensions are called axes. # # NumPy’s array class is called `ndarray`. It is also known by the alias `array`. The more important attributes of an ndarray object are: # # - **ndarray.ndim**: the number of axes (dimensions) of the array. # - **ndarray.shape**: the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows and m columns, shape will be `(n,m)`. The length of the `shape` tuple is therefore the number of axes, `ndim`. # - **ndarray.size**: the total number of elements of the array. This is equal to the product of the elements of `shape`. # - **ndarray.dtype**: an object describing the type of the elements in the array. One can create or specify dtype’s using standard Python types. Additionally NumPy provides types of its own. numpy.int32, numpy.int16, and numpy.float64 are some examples. # - **ndarray.itemsize**: the size in bytes of each element of the array. For example, an array of elements of type `float64` has `itemsize` 8 (=64/8), while one of type `complex32` has `itemsize` 4 (=32/8). It is equivalent to `ndarray.dtype.itemsize`. # - **ndarray.data**: the buffer containing the actual elements of the array. Normally, we won’t need to use this attribute because we will access the elements in an array using indexing facilities. # In[25]: a = np.arange(15) #array of numbers 0 to 14 print(a) print(a.shape) print(a.ndim) print(a.dtype.name) print(a.itemsize) print(a.size) print(type(a)) # #### Creating arrays # You can create an array from a regular Python list or tuple using the array function. The type of the resulting array is deduced from the type of the elements in the sequences. # # A frequent error consists in calling array with multiple numeric arguments, rather than providing a single list of numbers as an argument. # ``` # a = np.array(1,2,3,4) # WRONG # a = np.array([1,2,3,4]) # RIGHT # ``` # In[26]: b = np.array([6, 7, 8]) print(b) print(type(b)) # `array` transforms sequences of sequences into two-dimensional arrays, sequences of sequences of sequences into three-dimensional arrays, and so on. # In[27]: c = np.array([(1.5, 2 ,3), (4, 5, 6), (7.1, 7.2, 7.3)]) print(c) # The function `zeros` creates an array full of zeros, the function `ones` creates an array full of ones, the function `random.rand` creates an array of random floats from a uniform distribution over [0, 1], and the function `empty` creates an array whose initial content is random and depends on the state of the memory. By default, the dtype of the created array is `float64`. # In[28]: np.zeros((3,4)) # In[29]: np.ones((2,3,4), dtype=np.int16) # In[30]: np.random.rand(3,2) # In[31]: np.empty((2,3)) # uninitialized, output may vary # To create sequences of numbers, NumPy provides a function analogous to `range` that returns arrays instead of lists. # In[32]: np.arange( 10, 30, 5 ) # array from 10 to 30 in increments of 5 # #### Shape Manipulation # # Three main functions include: # - `ravel()` flattens an array # - `reshape()` changes the shape of arrays # - `transpose()` transposes the array # In[33]: example = np.floor(10*np.random.random((4,4))) example # In[34]: example.ravel() # returns the array, flattened # In[35]: example.reshape(2,8) # returns the array with a modified shape 2x8 # In[36]: example.transpose() # The `reshape` function returns its argument with a modified shape, whereas the `resize` method modifies the array itself: # In[37]: example.shape # In[38]: example.resize(2,8) example.shape # If a dimension is given as -1 in a reshaping operation, the other dimensions are automatically calculated: # In[39]: example.reshape(4,-1) # #### Linear Algebra # NumPy package contains `numpy.linalg` module that provides all the functionality required for linear algebra. Some of the important functions in this module are: # - `dot`: Dot product of the two arrays # - `vdot`: Dot product of the two vectors # - `inner`: Inner product of the two arrays # - `solve`: Solves the linear matrix equation # - `inv`: Finds the multiplicative inverse of the matrix # In[40]: M = np.array([[3, 0 ,2], [2, 0, -2], [0, 1, 1]]) v = np.array([1, 2, 3]) print(M.dot(v)) # Other functions: # - `multiply()`: Matrix product of the two arrays # - `eye()`: Returns a 2-D array with ones on the diagonal and zeros elsewhere # - `linalg.eig()`: Returns the eigenvalues and eigenvectors of the array # In[41]: print(np.eye(3)) # --- # ## Throw your notebook into slides: # `!jupyter nbconvert "Intro to Python.ipynb" --to slides --post serve`