Python for scientific computing

Marcos Duarte
Laboratory of Biomechanics and Motor Control (http://demotu.org/)
Federal University of ABC, Brazil

The Python programming language with its ecosystem for scientific programming has features, maturity, and a community of developers and users that makes it the ideal environment for the scientific community.

This talk will show some of these features and usage examples.

Computing as a third kind of Science

Traditionally, science has been divided into experimental and theoretical disciplines, but nowadays computing plays an important role in science. Scientific computation is sometimes related to theory, and at other times to experimental work. Hence, it is often seen as a new third branch of science.

theory-experiment-computation

Figure from J.R. Johansson.

The lifecycle of a scientific idea

In [3]:
from IPython.display import Image
Image(filename='../images/lifecycle_FPerez.png', width=600)  # image from Fernando Perez
Out[3]:

About Python [Python documentation]

Python is a programming language that lets you work more quickly and integrate your systems more effectively. You can learn to use Python and see almost immediate gains in productivity and lower maintenance costs [python.org].

  • Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Its high-level built in data structures, combined with dynamic typing and dynamic binding, make it very attractive for Rapid Application Development, as well as for use as a scripting or glue language to connect existing components together.
  • Python's simple, easy to learn syntax emphasizes readability and therefore reduces the cost of program maintenance. Python supports modules and packages, which encourages program modularity and code reuse.
  • Python is free and open source.

About Python [Python documentation]

  • Often, programmers fall in love with Python because of the increased productivity it provides. Since there is no compilation step, the edit-test-debug cycle is incredibly fast. Debugging Python programs is easy: a bug or bad input will never cause a segmentation fault. Instead, when the interpreter discovers an error, it raises an exception. When the program doesn't catch the exception, the interpreter prints a stack trace.
  • A source level debugger allows inspection of local and global variables, evaluation of arbitrary expressions, setting breakpoints, stepping through the code a line at a time, and so on. The debugger is written in Python itself, testifying to Python's introspective power. On the other hand, often the quickest way to debug a program is to add a few print statements to the source: the fast edit-test-debug cycle makes this simple approach very effective.*

Glossary for the Python technical characteristics I

  • Programming language: a formal language designed to communicate instructions to a computer. A sequence of instructions that specifies how to perform a computation is called a program.
  • Interpreted language: a program in an interpreted language is executed or interpreted by an interpreter program. This interpreter executes the program source code, statement by statement.
  • Compiled language: a program in a compiled language is first explicitly translated by the user into a lower-level machine language executable (with a compiler) and then this program can be executed.
  • Python interpreter: an interpreter is the computer program that executes the program. The most-widely used implementation of the Python programming language, referred as CPython or simply Python, is written in C (another programming language, which is lower-level and compiled).
  • High-level: a high-level programming language has a strong abstraction from the details of the computer and the language is independent of a particular type of computer. A high-level programming language is closer to human languages than to the programming language running inside the computer that communicate instructions to its hardware, the machine language. The machine language is a low-level programming language, in fact, the lowest one.
  • Object-oriented programming: a programming paradigm that represents concepts as "objects" that have data fields (attributes that describe the object) and associated procedures known as methods.
  • Semantics and syntax: the term semantics refers to the meaning of a language, as opposed to its form, the syntax.
  • Static and dynamic semantics: static and dynamic refer to the point in time at which some programming element is resolved. Static indicates that resolution takes place at the time a program is written. Dynamic indicates that resolution takes place at the time a program is executed.
  • Static and dynamic typing and binding: in dynamic typing, the type of the variable (e.g., if it is an integer or a string or a different type of element) is not explicitly declared, it can change, and in general is not known until execution time. In static typing, the type of the variable must be declared and it is known before the execution time.
  • Rapid Application Development: a software development methodology that uses minimal planning in favor of rapid prototyping.
  • Scripting: the writing of scripts, small pieces of simple instructions (programs) that can be rapidly executed.

Glossary for the Python technical characteristics II

  • Glue language: a programming language for writing programs to connect software components (inluding programs written in other programming languages).
  • Modules and packages: a module is a file containing Python definitions (e.g., functions) and statements. Packages are a way of structuring Python’s module namespace by using “dotted module names”. For example, the module name A.B designates a submodule named B in a package named A. To be used, modules and packages have to be imported in Python with the import function. Namespace is a container for a set of identifiers (names), and allows the disambiguation of homonym identifiers residing in different namespaces. For example, with the command import math, we will have all the functions and statements defined in this module in the namespace 'math.', for example, math.pi is the $\pi$ constant and math.cos(), the cosine function.
  • Program modularity and code reuse: the degree that programs can be compartmentalized (divided in smaller programs) to facilitate program reuse.
  • Source or binary form: source refers to the original code of the program (typically in a text format) which would need to be compiled to a binary form (not anymore human readable) to be able to be executed.
  • Major platforms: typically refers to the main operating systems (OS) in the market: Windows (by Microsoft), Mac OSX (by Apple), and Linux distributions (such as Debian, Ubuntu, Mint, etc.). Mac OSX and Linux distros are derived from, or heavily inspired by, another operating system called Unix.
  • Edit-test-debug cycle: the typical cycle in the life of a programmer; write (edit) the code, run (test) it, and correct errors or improve it (debug). The read–eval–print loop (REPL) is another related term.
  • Segmentation fault: an error in a program that is generated by the hardware which notifies the operating system about a memory access violation.
  • Exception: an error in a program detected during execution is called an exception and the Python interpreter raises a message about this error (an exception is not necessarily fatal, i.e., does not necessarily terminate or break the program).
  • Stack trace: information related to what caused the exception describing the line of the program where it occurred with a possible history of related events.
  • Source level debugger: Python has a module (named pdb) for interactive source code debugging.
  • Local and global variables: refers to the scope of the variables. A local variable is defined inside a function and typically can be accessed (it exists) only inside that function unless declared as global.

About Python

Python is also the name of the software with the most-widely used implementation of the language (maintained by the Python Software Foundation).
This implementation is written mostly in the C programming language and it is nicknamed CPython.
So, the following phrase is correct: download Python (the software) to program in Python (the language) because Python (both) is great!

Python

The origin of the name for the Python language in fact is not because of the big snake, the author of the Python language, Guido van Rossum, named the language after Monty Python, a famous British comedy group in the 70's.
By coincidence, the Monty Python group was also interested in human movement science:

In [2]:
from IPython.display import YouTubeVideo
YouTubeVideo('9ZlBUglE6Hc', width=480, height=360, rel=0)
Out[2]:

Why Python and not 'X' (put any other language here)

Python is not the best programming language for all needs and for all people. There is no such language.
Now, if you are doing scientific computing, chances are that Python is perfect for you because (and might also be perfect for lots of other needs):

  • Python is free, open source, and cross-platform.
  • Python is easy to learn, with readable code, well documented, and with a huge and friendly user community.
  • Python is a real programming language, able to handle a variety of problems, easy to scale from small to huge problems, and easy to integrate with other systems (including other programming languages).
  • Python code is not the fastest but Python is one the fastest languages for programming. It is not uncommon in science to care more about the time we spend programming than the time the program took to run. But if code speed is important, one can easily integrate in different ways a code written in other languages (such as C and Fortran) with Python.
  • The IPython Notebook is a versatile tool for programming, data visualization, ploting, simulation, numeric and symbolic mathematics, and writting for daily use.

Popularity of Python for teaching

In [3]:
from IPython.display import IFrame
IFrame('http://cacm.acm.org/blogs/blog-cacm/176450-python-is-now-the-most-popular-' +
       'introductory-teaching-language-at-top-us-universities/fulltext',
       width='100%', height=450)
Out[3]:

Python ecosystem for scientific computing (main libraries)

  • Python of course (the CPython distribution): a free, open source and cross-platform programming language that lets you work more quickly and integrate your systems more effectively.
  • Numpy: fundamental package for scientific computing with a N-dimensional array package.
  • Scipy: numerical routines for scientific computing.
  • Matplotlib: comprehensive 2D Plotting.
  • Sympy: symbolic mathematics.
  • Pandas: data structures and data analysis tools.
  • IPython: provides a rich architecture for interactive computing with powerful interactive shell, kernel for Jupyter, support for interactive data visualization and use of GUI toolkits, flexible embeddable interpreters, and high performance tools for parallel computing.
  • Jupyter Notebook: web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text.
  • Statsmodels: to explore data, estimate statistical models, and perform statistical tests.
  • Scikit-learn: tools for data mining and data analysis (including machine learning).
  • Pillow: Python Imaging Library.
  • Spyder: interactive development environment with advanced editing, interactive testing, debugging and introspection features.

The Jupyter Notebook

The Jupyter Notebook App is a server-client application that allows editing and running notebook documents via a web browser. The Jupyter Notebook App can be executed on a local desktop requiring no internet access (as described in this document) or installed on a remote server and accessed through the internet.

Notebook documents (or “notebooks”, all lower case) are documents produced by the Jupyter Notebook App which contain both computer code (e.g. python) and rich text elements (paragraph, equations, figures, links, etc...). Notebook documents are both human-readable documents containing the analysis description and the results (figures, tables, etc..) as well as executable documents which can be run to perform data analysis.

Try Jupyter Notebook in your browser.

Jupyter Notebook and IPython kernel architectures

Jupyter Notebook and IPython kernel architectures

Installing the Python ecosystem

The easy way
The easiest way to get Python and the most popular packages for scientific programming is to install them with a Python distribution such as Anaconda.
In fact, you don't even need to install Python in your computer, you can run Python for scientific programming in the cloud using python.org, pythonanywhere, or repl.it.

The hard way
You can download Python and all individual packages you need and install them one by one. In general, it's not that difficult, but it can become challenging and painful for certain big packages heavily dependent on math, image visualization, and your operating system (i.e., Microsoft Windows).

Anaconda

Go to the Anaconda website and download the appropriate version for your computer (but download Anaconda3! for Python 3.x). The file is big (about 500 MB). From their website:
Linux Install
In your terminal window type and follow the instructions:

bash Anaconda3-4.4.0-Linux-x86_64.sh

OS X Install
For the graphical installer, double-click the downloaded .pkg file and follow the instructions
For the command-line installer, in your terminal window type and follow the instructions:

bash Anaconda3-4.4.0-MacOSX-x86_64.sh

Windows
Double-click the .exe file to install Anaconda and follow the instructions on the screen

Miniconda

A variation of Anaconda is Miniconda (Miniconda3 for Python 3.x), which contains only the Conda package manager and Python.

Once Miniconda is installed, you can use the conda command to install any other packages and create environments, etc.

My current installation

In [4]:
# pip install version_information
%load_ext version_information
%version_information numpy, scipy, matplotlib, sympy, pandas, ipython, jupyter
Out[4]:
SoftwareVersion
Python3.6.2 64bit [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
IPython6.1.0
OSDarwin 16.7.0 x86_64 i386 64bit
numpy1.13.1
scipy0.19.1
matplotlib2.0.2
sympy1.1.1
pandas0.20.3
ipython6.1.0
jupyter1.0.0
Wed Sep 20 14:06:10 2017 -03

IDE for Python

You might want an Integrated Development Environment (IDE) for programming in Python.
See Top 5 Python IDEs For Data Science for possible IDEs.
Soon there will be a new IDE for scientific computing with Python: JupyterLab, developed by the Jupyter team. See this video about JupyterLab.

To learn about Python

There is a lot of good material in the internet about Python for scientific computing, some of them are:

More examples of Jupyter Notebooks

Let's run stuff from:

In [5]:
Image(data='http://imgs.xkcd.com/comics/python.png')
Out[5]: