Introduction to Python and IPython

Python

When people talk about Python, they can mean a couple of things. Python is

  1. A programming language
  2. The program, called an "interpreter" that runs code written in that language. Matlab and R are also interpreted languages. You can think of the interpreter as the "engine" that takes your code and uses it to make things happen.

While there is only a single definition of the Python language defined by the Python Software Foundation, there are many versions of the interpreter, written in different languages to run on different types of systems. (Curious fact: the "standard" Python interpreter, called CPython, is written in the C programming language, while program that compiles code written in the C programming language is itself written in C!)

Interactive vs script

Python can be run in two basic modes. In the first, we would run a script written in python by calling the python interpreter from the command line:

$ python myprogram.py

For the second, we would type

$ python

at the command line to begin an interactive python session. In this session, we can execute commands in python interactively (just like the shell!), get the results, and use the output. You will know you are in the python shell when you see the prompt

>>>

instead of your normal shell prompt.

Note that the ability to interactively run programs is one of the things that separates languages like python (and R and matlab) from compiled languages like C and Java. This is one of the major reasons we teach Python for data analysis.

Version

If you read enough about Python, you will eventually see mention of differences with Python 2 and Python 3. Several years ago, the folks in charge of Python realized that they needed to make some serious changes in the language that would mean old software would no longer run, thus breaking what software engineers call "backward compatibility." As a result, it's taken several years, but as of now, nearly all major Python packages have been ported to Python 3, and ongoing development of Python 2 packages has (or will soon be) discontinued.

For this reason, you should begin by learning Python 3. For most Python you see in the wild, the differences are fairly minor, and code written for Python 2 can easily be made to run.

The ecosystem

Python is a good language, and easier to learn than many. But learning to program isn't actually the hard part. The hard part is doing the actual analysis, which often means finding tools that make the analysis possible. Since most of us aren't professional-grade programmers, we use code written by others to do our science. And when that code isn't readily available, it makes our lives needlessly difficult.

All of which is to say we teach Python because Python has the tools for scientific computing. There are lots and lots and lots of libraries for Python, and in the last several years, these have coalesced around a bunch of key technologies, including

  • the standard Python libraries, which define the language
  • NumPy, which defines a fast, efficient array that can be used for heavy number crunching
  • SciPy, which includes functions for signal processing, special functions, and statistics
  • Pandas, which defines a type of object called a data frame for organizing and manipulating data sets
  • Matplotlib, bokeh, seaborn, and a host more for plotting
  • statsmodels and patsy for defining and fitting statistical models like regressions
  • Scikit-learn for machine learning
  • Scikit-image and PIL for image processing
  • Cython, PyPy, blaze, and Shed Skin for making your code run fast
  • SymPy for computer algebra (think Mathematica)

Even more important, if you work with neuroscience data, there are tremendously good libraries available:

  • for MRI: NIPy is a whole set of related tools for cleaning and analyzing MRI data
  • for EEG/MEG: there is less dominance here, but MNE (paper) seems to be quite good
  • for spikes: Unfortunately, there isn't nearly as much here. This is because single unit electrophysiologists believe they are smarter than everyone else and generally refuse to run anything but their own code. In truth, it's because these data require a little less cleaning and processing. Once the spikes are sorted (lots of options there), some analysis is available here and here, while my code is here
  • for coding behavioral experiments: Psychophysics Toolbox has been the go-to solution with Matlab. On the Python side, I like PsychoPy. (See also my blog post on this.)
  • for simulations of neural networks, Brian and PyNN make what the formerly excruciating routine.
  • for Bayesian analysis, there are PyStan, PyMC, and emcee

Sounds like a mess, I know, but you can visualize it a little like the figure below. There, arrows represent dependencies. For instance, arrows point from SciPy, Cython, and NumPy to Pandas because Pandas builds on all of these. Similarly, statsmodels builds on Pandas. As you can see NumPy is at the heart of many of these advanced tools.

IPython

History

Almost a decade ago, Fernando Perez began the IPython project to design a better version of the Python shell. As you'll recall, the shell is itself just another program. It gets input from the user, talks to the file system and operating system, and runs other commands/programs. You can view the python shell as a similar type of program, except, instead of talking to the operating system, the shell talks to the Python interpreter. IPython simply replaces the old Python shell with something much more flexible and powerful.

More recently, IPython took a big step forward in releasing the IPython server and the IPython notebook, which is the system that generated this document. When you type

ipython notebook

at the command prompt, the IPython Notebook Server starts on your local machine, allowing you to interact with Python through your browser:

Even more exciting, though, the IPython server doesn't have to run on your machine. You can interact with an IPython server running notebooks backed by a powerful computing cluster at a remote location!

So why all the fuss?

The IPython notebook has proven incredibly popular for a few reasons:

  • it allows us to mix figures, code, and text in the same document
  • it's easy to convert a notebook to pdf, html, and other document formats
  • with sites like nbviewer, it's really easy to share with the world
  • because the notebook interface is language-agnostic, we can even use Matlab and R in the notebook

You'll also note those spiffy menus and buttons at the top of the browser window. These allow you to do things like run the entire notebook, stop the notebook (if your code is taking too long for some reason), create or delete cells, and change what type of cell you're working on. There are also handy keyboard shortcuts for many of these things.

For those who really want to dive in, I suggest the extended tutorial here, but let's go ahead and see what sorts of fun things are really very easy to do:

Markdown

Markdown started out as a quick and dirty way to write HTML without all the clutter of tags. Write using something like normal text, and the program would convert it to a well-formatted web document. However, Markdown proved so successful that places like GitHub adopted and extended it, so that now, there are at least half a dozen markdown dialects out there, including code highlighting.

Markdown is very easy to learn. For instance:

# Pound signs indicate varying levels of header
## this is a subheading
### this is a sub-subheading

lists work like this
- one item
- another item
- etc.

or this:
1. first point
1. second point
1. notice these all start with 1? Markdown numbers automatically

And [web links](http://www.duke.edu) are easy to do, too.

Finally, you can get code highlighting like so:

~~~python
print "Hello, world!"
~~~

becomes

Pound signs indicate varying levels of header

this is a subheading

this is a sub-subheading

lists work like this

  • one item
  • another item
  • etc.

or this:

  1. first point
  2. second point
  3. notice these all start with 1? Markdown numbers automatically

And web links are easy to do, too.

Finally, you can get code highlighting like so:

print "Hello, world!"

Sweet! How does it work?

To have your cell converted to Markdown in the notebook, you can either select "Markdown" from the dropdown box of cell types or hit Escape (so that your cursor disappears) and hit m. This will change the selector up top as well.

Another killer trick: Equations

For those of you who know LaTeX, IPython uses MathJax to render equations in the browser:

Here is one of Maxwell's Equations:
$$
\nabla \times \mathbf{E} +\frac{\partial \mathbf{B}}{\partial t} = \mathbf{J}
$$

becomes

Here is one of Maxwell's Equations: $$ \nabla \times \mathbf{E} +\frac{\partial \mathbf{B}}{\partial t} = \mathbf{J} $$

But wait, there's more!

Remember all that bash we learned? You can execute shell commands through the notebook by starting your code cell with a ! (pronounced "bang"):

In [1]:
!ls
Basic Data Analysis.ipynb
Basic Programming in Python.ipynb
Introduction to Python and IPython.ipynb
Working with Array Data.ipynb
data
ecosystem.svg
ipython_communication.svg
ipython_local.svg
ipython_remote.svg
lemurfig.pdf
lemurs.py
lemurs.pyc
marmoset.jpg
summary_data.csv
In [2]:
!pwd
/Users/jmxp/code/DIBS_materials/python

IPython is magic!

Finally (or sort of finally; there's much, much more), IPython has a set of extensions to the normal Python shell called magics. These come in especially handy for running scripts saved in .py files, debugging, and interfacing with R or Matlab.