In [1]:
# Pay no attention to this cell
# All will be revealed in due time.
import pandas as pd
from IPython.display import Image
syllabus=pd.read_csv('Datasets/syllabus_2019.csv',header=0)
syllabus=syllabus.fillna("")
syllabus.index = range(1,len(syllabus)+1)

Python Programming for Earth Science Students

Authors: Lisa Tauxe, [email protected], Hanna Asefaw, [email protected], & Brendan Cych, [email protected]

Computers in Earth Science

Computers are essential to all modern Earth Science research. We use them for compiling and analyzing data, preparing illustrations like maps or data plots, writing manuscripts, and so on. In this class, you will learn to write computer programs with special applications useful to Earth Scientists. We will learn Python, an object-oriented programming language, and use Jupyter notebooks to write our Python programs.

Python

So, why learn Python? Because it is:

  • Flexible, freely available, cross platform
  • Easier to learn than many other languages
  • It has many numerical, statistical and visualization packages
  • It is well supported and has lots of online documentation
  • The name 'Python' refers to 'Monty Python' - not the snake - and many examples in the Python documentation use jokes from the old Monty Python skits. If you have never heard of Monty Python, look it up on youtube; you are in for a treat.

Which Python?

  • Python is undergoing a transition from 2.7 to 3. The notebooks in this class, apart from a few exceptions, are compatible with both but they have only been tested on Python 3, so that is what you should be using.
  • If you decide to use a personal computer, we recommend that you install the most recent version of Anaconda python for your operating system: https://www.anaconda.com/download/ you will also need a few extra packages (cartopy, version 0.17.0 and PySimpleGUI) which can be installed with little hassle.
In [3]:
syllabus[['Topic']]
Out[3]:
Topic
1 Intro to notebooks, file systems and paths
2 Variables and Operations
3 Data structures
4 Dictionaries, program loops (if, while and for)
5 functions and modules
6 NumPy and matplotlib
7 NumPy arrays
8 Pandas, file I/O
9 data wrangling with Pandas
10 object oriented programming
11 lambda, map, filter reduce, list comprehension
12 Pandas filtering and exceptions
13 subplots, bar charts pie charts
14 histograms and cumulative distribution functions
15 statistics 101
16 line and curve fitting
17 visualization with seaborn
18 maps
19 gridding and contouring
20 rose diagrams and equal area projections
21 matrix math - dot and cross products
22 plotting great and small circles
23 Graphical User Interfaces (GUIs) and animations
24 Machine learning
25 3D plots of points and surfaces
26 Time series - periodograms
27 Animations

Lecture 1

Now we get down to business. In this lecture we will:

  • Learn to find your command line interface.
  • Learn how to launch a Jupyter notebooks from the command line interface
  • Learn basic notebook anatomy.
  • Learn some basic UNIX commands
  • Learn about the concept of PATH
  • Turn in your first practice problem notebook.

Jupyter notebooks

This class is entirely structured around a special programming environment called Jupyter notebooks. A Jupyter notebook is a development environment where you can write, debug, and execute your programs.

If you are taking this class through UCSD, make a folder on your Desktop to keep material for this class. If you haven't already, download the zip file for this lecture from TritonEd. Unzip the file and put the folder into your class folder.

If you are using the version cloned from github you already have everything. Some of the lectures might be updated in the future though, so the version you have may not be final.

Launching a Jupyter notebook

To do this, you will need to discover the hidden secret of your computer, the Terminal window (or Anaconda Prompt). This little window provides a command line interface in which you can type commands to the operating system. You can find the terminal window through the program Terminal on a Mac by typing terminal.app into the search icon. If you double click on it, it will open a terminal window.

In [4]:
Image(filename='Figures/terminal_mac.png')
Out[4]:

On a PC, you should use the Anaconda Prompt which you can find in your programs after you install Anaconda Python.

Let's open a terminal window (command prompt).

In [5]:
Image(filename='Figures/terminal.png',width=500)
Out[5]:

When you fire up a terminal window, you are by default in your home directory (in MacOS UNIX, that would be /Users/YOURUSERNAME).

To launch a Jupyter notebook, simply type jupyter notebook as shown above. That will open up a Browser window (use Firefox, Safari or Chrome - NOT Windows Explorer). Find your class folder and click on Lecture_01.ipynb. You should now be looking at this notebook!

Jupyter notebook anatomy

Jupyter notebooks have two basic cells:

You can insert a new cell by selecting Insert Cell Below in the drop-down menu:

In [6]:
Image(filename='Figures/insertCell.png')
Out[6]:

You change the cell "flavor" with the menu that defaults to 'Code' and can be changed to "Markdown".

And you "execute" a cell (either typeset or run the code) by clicking on the run key (sideways triangle with vertical line) or select Run Cells under the Cell drop-down menu.

In [7]:
Image(filename='Figures/menuBar.png')
Out[7]:

In a code block, you can only type valid python statements EXCEPT after a pound sign (#) - everything after that will be ignored.
That is how you write "comments" in your code to remind yourself or tell others what you were thinking:

In [8]:
# I can type anything here
but not here
  File "<ipython-input-8-bee698e92c8a>", line 2
    but not here
               ^
SyntaxError: invalid syntax

That was an example of a bug which oculd be fixed by commenting out the second line, or making it a valid statement:

In [9]:
# I can type anything here
# but not here
print ("but not here")
but not here

Practice Problems

Now open the notebook called Lecture_01_Practice_Problems and complete the first three tasks. Then come back to this notebook.

Congratulations! You just wrote your first Python program.

Basic UNIX commands

Now we will discuss file systems, paths, and the command line. Why? Because whenever you import an image, document, or spreadsheet into the Jupyter notebook you have to tell Jupyter where in the computer the file is located. Moreover, there are many command line functions that come in handy. For example, you can look at the first few lines of a file before you import it into the notebook. You could also write all of your programs in a text editor and run those programs from the command line. You could then run your programs from anywhere on your computer instead of a jupyter notebook. We will do that in Lecture 23, for example.

File systems

The organization of computers is based on a file system. The file system is hierarchical, so at the top you'll find the root directory or for Mac and PC users, a folder. The root directory contains files and other folders which may also contain files and folders and etc. This continues, resulting in a tree of files and folders that make up the file system. The following figure is an example of a computer's file system:

In [10]:
Image(filename='Figures/FileSystem.jpg')
Out[10]:

You are probably familiar with the images like that to the left. The text to the right shows the exact same thing - but from your computer's viewpoint. Both the image to the left and the text to the right show you how to access the folder "Desktop". On the left, you access the folder "Desktop" by clicking on 'icons' that represent different folders and sub-folders until you arrive at "Desktop". Later in this lecture, we'll show you how to access the same folder using its path (the text to the right).

Survival UNIX commands

Macs and PCs both have functions that can be called from a command line, such as listing the contents of a folder or file, creating new folders, changing permissions on files or folders, combining the contents of files, moving files and folders around, and so on. These commands are directed to the operating system instead of the Python interpreter.

Before we begin using commands, we can execute many operating system commands from within a Jupyter notebook. To signal to Jupyter that your commands are not for Python but for the operating system, you may type a "!" (bang) in front of the command. [It isn't actually a requirement, but it does help separate what is a command line command versus what is a python script.]

Let's learn our first UNIX command, which lists the contents of a directory, ls.

In [11]:
!ls
Datasets                           Lecture_01_Practice_Problems.ipynb
Figures                            Lecture_01_syllabus.ipynb

Another useful command is mkdir which creates a new directory. Please note that directory means the same thing as folder. It is just that in a graphical operating system with icons, the term folder makes sense. They look like folders. Whereas to the operating system, they are traditionally referred to as directories. Never mind!

In [12]:
!mkdir MYNEWDIRECTORY

To see if that worked, list the contents again:

In [13]:
!ls
Datasets                           Lecture_01_syllabus.ipynb
Figures                            MYNEWDIRECTORY
Lecture_01_Practice_Problems.ipynb

And sure enough, there it is. The command rmdir deletes a directory

In [14]:
!rmdir MYNEWDIRECTORY

Make sure it was removed:

In [15]:
!ls
Datasets                           Lecture_01_Practice_Problems.ipynb
Figures                            Lecture_01_syllabus.ipynb

We can list the contents of a file with the UNIX command, cat (which comes from concatenate).

In [16]:
!cat Datasets/myfile.txt
Hi there students! Thanks for joining this class!

Usually, the output of cat is sent to the screen, but UNIX has tricky ways of redirecting output to other files. For example, if we combine cat with the symbol > we can redirect the output to another file, instead of to the screen:

In [18]:
!cat Datasets/myfile.txt >newfile.txt
!ls
Datasets                           Lecture_01_syllabus.ipynb
Figures                            newfile.txt
Lecture_01_Practice_Problems.ipynb

So what did we create? We created a copy of myfile.txt called newfile.txt. If you repeat the command, you will overwrite the existing output file.

To append to the end of a file (actually concatenate), we use the symbol >>:

In [19]:
!cat Datasets/myfile.txt >>newfile.txt
!cat newfile.txt
Hi there students! Thanks for joining this class!
Hi there students! Thanks for joining this class!

There are a few other useful redirect symbols: <, and |. The first, <, takes the contents of the argument and redirects it into the command. The second, |, pipe,takes the output of the first command and 'pipes' it to a second. So we could do:

In [20]:
!cat Datasets/myfile.txt |cat
Hi there students! Thanks for joining this class!

... which is a little silly as it just does the same thing as the first command, but you don't know any other commands right now so...

Concept of path

So far, we have just looked at directories in our working directory (the one with this notebook in it) and subdirectories within the working directory. Earlier in the lecture you were shown a figure with icons on the left and text on the right. The text to the right was a series of directories separated by '/'. These are the paths to those files. A path is the unique location of a file or a directory in a file system of an OS.

Now that you know more about paths, let's take a detour and learn how to embed figures directly into a Jupyter notebook. You saw this in several lectures, but were told to ignore it. The Image class in the module Ipython.display allows us to embed many digital image types (png, jpg...) into a Jupyter notebook. If you take a look at the first cell of this lecture, we have already imported Image from Ipython.display.

If you want to display a figure, you will use Image and the path to the figure. The path to the figure we want to display is "Figures/FileSystem.jpg". This tells the operating system to find the folder labeled "Figures" and then grab the file inside that is labeled "FileSystem.jpg". This is a relative path because the location is with respect to the directory that the notebook is in.

In [21]:
Image(filename='Figures/FileSystem.jpg') 
Out[21]:

The paths in this figure are absolute paths which uniquely define the location of the file or directory from anywhere on the computer. The relative paths are handy short cuts. For example, we can refer to a directory above the current directory without knowing what that is necessarily, we use these conventions:

./ is the current directory

../ is the one above

../../ is the one above that

and so on.

Instead of using 'relative' directories, it is often desirable to refer to directories in an absolute sense, i.e., relative to the root directory '/'.

To find out what the absolute path for your current directory, use pwd for 'print working directory':

In [22]:
!pwd
/Users/ltauxe/Dropbox/4SIO113_2019/Lecture_01

To refer your home directory, just use the short cut ~:

In [23]:
!ls ~
AnacondaProjects     Movies               VirtualBox VMs
Applications         MultiDrive           anaconda3
Creative Cloud Files Music                bin
Cubit-13.0           Pdfs                 enthought
Desktop              Pictures             log4j
Documents            PmagPy               logs
Downloads            Programs             personal_stuff
Dropbox              Public               profiles.bin
Google Drive         Python               reprints
Library              Sites                src
MagIC                SpareRoom            webpasswords
Meetings_2018        TeXShop

I guess I should clean up my Desktop!

To move from one directory to another, use the command cd (for change directory). You can cd to a directory using any of the following:

  • cd ~ (takes you to your home directory)
  • cd Figures (takes you to the Figures folder in the working directory
  • cd FULL_PATH_NAME (to change into any directory with its full path name
  • cd .. (move to the directory above you)
  • cd ../.. (move two directories up)
  • and so on.

Command line python scripts

As mentioned in the beginning of the lecture, you can run all the little programs you have been (and will be) writing, directly from the command line. Here's one way to do that that uses one of the many ["magic" commands] (https://ipython.readthedocs.io/en/stable/interactive/magics.html#cell-magics) that work with Jupyter notebooks. Our first is:

%%writefile PATH_TO_FILE.py

which writes the contents of a cell to the specified text file.

Running this cell will place the contents of it (without the magic command) into a file in this directory called hello.py.

In [2]:
%%writefile ./hello.py
print ("Hello World!")
Writing ./hello.py

Now you can run the program from your command line (after navigating to this directory) by typing:

$ python hello.py

or from within this notebook:

In [25]:
!python hello.py
Hello World!

Alternatively, you can use a different magic command: %run to execute an external file:

In [4]:
%run hello.py
Hello World!

To run the program on a Mac without a python command first from the command line, you need to do a few additional things.

1) You have to put this line at the top of the script:

!/usr/bin/env python

This won't hurt you on a PC, it just isn't necessary.

In [5]:
%%writefile hello.py
#!/usr/bin/env python
print ("Hello World!")
Overwriting hello.py

2) The script must be executable. To find out whether a particular script is executable, type:

ls -al YOURSCRIPTNAME:

here it is in the notebook:

In [6]:
!ls -al hello.py
-rw-r--r--  1 ltauxe  staff  45 Feb  8 14:21 hello.py

The '[email protected]' string at the front indicates who can do what with the script. The first three letters '-rw' say that the 'user' (me) can read and write (but not execute) the script. The next three are for the 'group' and the third are for anyone (all).

To make it executable, I need to use the Unix command: chmod to set the permissions. To make it executable for everyone, I type:

chmod a+x

where the 'a' means all and the 'x' means 'executable'.

In [7]:
!chmod a+x hello.py
!ls -al hello.py
-rwxr-xr-x  1 ltauxe  staff  45 Feb  8 14:21 hello.py

see how everybody has an 'x' now? Now you can run it either from the command line by typing

$ ./hello.py

or from within the notebook:

In [8]:
!./hello.py
Hello World!

3) The last thing you have to worry about is that the directory containing the script must be in your PATH. We have been talking about paths (all lower case), but PATH is an environment variable on Unix-like operating systems and also DOS (what is used by PCs) specifying a set of directories where the operating system looks for executable programs. So to run a program it must be in your PATH. And to run a Python program from anywhere, it must be in your PYTHONPATH.

You can find out what your PATH is, by typing echo $PATH on the command line (or in the notebook as shown here):

In [9]:
!echo $PATH # your results will definately vary!
/Users/ltauxe/anaconda3/bin:/Users/ltauxe/anaconda3/bin:/usr/local/Cellar/cmake/3.9.0/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/X11/bin:/usr/texbin:/Library/TeX/texbin::/Users/ltauxe/PmagPy/programs/__pycache__/:/Users/ltauxe/PmagPy/programs/conversion_scripts/:/Users/ltauxe/PmagPy/programs/conversion_scripts2/:/Users/ltauxe/PmagPy/programs/deprecated/:/Users/ltauxe/PmagPy/programs/images/:/Users/ltauxe/PmagPy/programs/:/Applications/GMT-5.3.1.app/Contents/Resources/bin

By default, your working directory will not be in your path (some security reason), so to run a script that is in your working directory, you must either put it in your PATH (not recommended) or use the full path name or the relative path name, e.g.,

./hello.py

Changing your PATH depends a lot on your particular operating system. Most recent Macs set the path in a hidden file in your home directory called .bash_profile. I recommend that you put all your Python scripts in some directory (say, Python) in your user directory. Then you can put these lines in your .bash_profile file (for example using cat>>.bash_profile).

export PYTHONPATH=$PYTHONPATH:~/Python

(followed by control-D when using cat).

When you open a new terminal window, your PATH environment variable should be set properly. If it is, you can use your python scripts from any directory and also import them into a Jupyter notebook.

In [10]:
#clean up a bit
!rm hello.py