Introduction to Python

Getting Started

Schedule

Date Session Title Description
Day 1 Lecture 1 Intro and Setup Why Python, Setup, etc
Day 1 Lecture 2 Basic Python Syntax, data structures, control flow, etc
Day 2 Lecture 3 Numpy + Pandas I Numpy basics, series/dataframe basics
Day 2 Lecture 4 Pandas II Joining, advanced indexing, reshaping, etc
Day 3 Lecture 5 Pandas III Grouping, apply, transform, etc
Day 3 Lecture 6 Plotting Intro to plotting in Python
Day 3 Lecture 7 Intro to Modeling Intro to stats/ML models in Python

Course materials

github.com/ihmeuw/ihme-python-course

First, you're going to want to get a copy of this repository onto your machine. Simply fire up git and clone it:

  1. Open up a shell (e.g. git.exe, cmd.exe, or terminal.app)

  2. Navigate to where you'd like to save this.

    • We recommend ~/repos/ (e.g. C:/Users/<user>/repos/ on Windows, /Users/<user>/repos/ on Mac, or /home/<user>/repos/ on Unix).
  3. Clone this repo:

    git clone https://github.com/ihmeuw/ihme-python-course.git

If you need help with setting up git, see this page or simply download the repo as a zip file for now...

What is Python? 🐍

Python is a widely used high-level, general-purpose, interpreted, dynamic programming language.

Broadly:

Officially, Python is an interpreted scripting language (meaning that it is not compiled until it is run) for the C programming language; in fact, Python itself is coded in C (though there are other non-C implementations). It offers the power and flexibility of lower level (i.e. compiled) languages, without the steep learning curve and associated programming overhead. The language is very clean and readable, and it is available for almost every modern computing platform.

Advantages:

  • Ease of programming, minimizing the time required to develop, debug and maintain code
  • Well-designed language that encourages good programming practices:
    • Modular and object-oriented programming, good system for packaging and re-use of code
    • Documentation tightly integrated with the code
  • A large standard library with many extensions

Disadvantages:

  • Since Python is an interpreted and dynamically typed programming language, the execution of python code can be slow compared to compiled statically typed programming languages, such as C and Fortran
  • Somewhat decentralized, with different environments, packages and documentation spread out at different places

Scientific Computing in Python

Why do we use Python at IHME?

  • Powerful and easy to use
  • Interactive
  • Extensible
  • Large third-party library ecosystem
  • Free and open

Powerful and easy to use

  • Python is simultaneously powerful, flexible and easy to learn and use (in general, these qualities are traded off for a given programming language)
  • Anything that can be coded in C, FORTRAN, or Java can be done in Python, almost always in fewer (and more readable) lines of code, and with fewer debugging headaches
  • Its standard library is extremely rich, including modules for string manipulation, regular expressions, file compression, mathematics, profiling and debugging (etc)
  • Python is object-oriented, which is an important programming paradigm particularly well-suited to scientific programming, which allows data structures to be abstracted in a natural way

Interactive

  • Python may be run interactively on the command line, in much the same way as R
  • Notebooks offer convenient prototyping, mixing code in with outputs such as graphs and direct viewing of data structures
  • Rather than compiling and running a particular program, commands may entered serially which is often useful for mathematical programming and debugging

Extensible

  • Often referred to as a “glue” language, meaning that it is a useful in a mixed-language environment
    • (such as at IHME, where we often have to combine R, Stata, C++, etc)
  • Python was designed to interact with other programming languages, both through interfaces like rpy2 and by compiling directly into Python extensions using e.g. Cython
  • New interfaces coming out all the time, such as for GPU computing

Large third-party library ecosystem

There are modules available for just about anything you could want to do in Python, with nearly 100,000 available on PyPI alone. Some notable packages:

  • NumPy: Numerical Python (NumPy) is a set of extensions that provides the ability to specify and manipulate array data structures. It provides array manipulation and computational capabilities similar to those found in Matlab or Octave.
  • SciPy: An open source library of scientific tools for Python, SciPy supplements the NumPy module. SciPy gathering a variety of high level science and engineering modules together as a single package. SciPy includes modules for graphics and plotting, optimization, integration, special functions, signal and image processing, genetic algorithms, ODE solvers, and others.
  • Matplotlib: Matplotlib is a python 2D plotting library which produces publication-quality figures in a variety of hardcopy formats and interactive environments across platforms. Its syntax is very similar to Matlab.
  • Pandas: A module that provides high-performance, easy-to-use data structures and data analysis tools. In particular, the DataFrame class is useful for spreadsheet-like representation and mannipulation of data. Also includes high-level plotting functionality.
  • IPython: An enhanced Python shell, designed to increase the efficiency and usability of coding, testing and debugging Python. It includes both a Qt-based console and an interactive HTML notebook interface, both of which feature multiline editing, interactive plotting and syntax highlighting.

Free and open

  • Python is released on all platforms under an open license (Python Software Foundation License), meaning that the language and its source is freely distributable
  • Keeps costs down for scientists and universities operating under a limited budget
  • Frees programmers from licensing concerns for any software they may develop

Setup

Python 2 vs 3

A debate as old as time. Don't worry about it, just use Python 3 for now.

The Anaconda Distribution

Anaconda is a suite of tools for Python (and R...) that will install everything you need to get up and running with Python

Python interpreter

  • Simply typing python at the command line will open up the standard Python interpreter:
  • Seldom used interactively - but we'll use it a lot for running completed programs, e.g.

    python my-program.py

IPython

IPython is an interactive interpreter for Python that adds many user-friendly features on top of the standard python:

  • Tab auto-completion
  • Command history (using up and down arrows)
  • In-line highlighting and editing of code
  • Object introspection
  • Automatic extraction of docstrings from Python objects like classes and functions

Start IPython by simply calling ipython from the command line:

Jupyter

  • Jupyter notebooks are HTML-based environments for IPython, R, and more, which allow you to interactively code, explore your data, and integrate documentation
  • This is a Jupyter notebook
  • Starting a Jupyter notebook server will launch a local webserver that you can view in your browser to create, view, and run notebooks:

      jupyter notebook

Spyder

  • Spyder is a MATLAB-like IDE for scientific computing with Python
  • Everything from code editing, execution and debugging is carried out in a single environment, and work on different calculations can be organized as projects in the IDE environment
  • Calling Spyder will open up a new project

      spyder

Another interesting new IDE for Python is Rodeo

Installation

The easy way

Go to the Anaconda download page and download the installer for Python 3.5 (64-bit) and simply click through to follow the instructions

The fancy way

If you'd like to setup a Docker container with Anaconda, check out the Docker setup instructions. But be warned that it doesn't play terribly nicely with Windows 7 or 8...

Slideshows

If you'd like to be able to view these notebooks in slideshow mode, install RISE

conda install -c damianavila82 rise
In [1]:
from IPython.display import Markdown, display
display(Markdown(open('./Exercise 1/README.md', 'r').read()))

Exercise 1

Installing Anaconda

See the README for instructions on how to install Anaconda on your system.

Clone this repo

See the README again for instructions on cloning this repo (into its recommended location of ~/repos/).

Opening the python interpreter

  • Open up your terminal and type python at the command line
  • Try typing some things in to see how they work... some suggestions:
    • 1 + 1
    • 1.0 + 1
    • "hello world"
    • print("hello world")
  • Type exit() to exit

Running ipython

  • Open up the interactive IPython interpreter
    • ipython
  • Try some of the same inputs as above
  • Type print? and hit return
  • Type print( and hit tab
  • Type exit() to exit

Exploring jupyter notebook

  • Open up a terminal and navigate to the root directory of this repo (e.g. ~/repos/ihme-python-course)
  • Type jupyter notebook to startup a notebook server
  • Your web browser will most likely automatically open
    • If not, you can navigate to http://localhost:8888/
    • Note: if you're already running something on port 8888 it might start the server on a different port. Look for a message in your terminal like
        The Jupyter Notebook is running at: http://localhost:8889/
      
      to find it.
  • Use the file tree to navigate to the Lecture 1/Exercise 1 directory and and then click on test-notebook.ipynb to open it
  • Follow the instructions in the notebook

Run a script from the command line

  • Examine Exercise 1/test-script.py with your favorite editor or Spyder
  • Run it from the command line
    • Hint: remember python my-script.py
  • Change the message from "hello world" to something else and execute it

Optional: Install RISE

  • RISE will add a new button to your Jupyter toolbar that'll allow you to view these notebooks in slideshow mode
  • Click the new bargraph-esque icon in the toolbar to start a slideshow
  • Use spacebar and shift+spacebar to navigate through the slides
  • To edit how the slideshow is structured, take a look at View > Cell Toolbar > Slideshow

References