Material for a University of Illinois course offered by the Physics Department. This content is maintained on GitHub and is distributed under a BSD3 license.
This course assumes a basic familiarity with the core python language. If you are rusty or still learning, I recommend the free ebook A Whirlwind Tour of Python, which is "a fast-paced introduction to essential components of the Python language for researchers and developers who are already familiar with programming in another language".
If you are currently using python 2.x and reluctant to move to python 3, read this and this.
No previous experience with git or github is necessary for this course (but they are useful research tools so worth learning - here is a good starting point). If you are finding the git learning curve to be steep, you are not alone.
Clone the course material from github with the following command, which will create a subdirectory called syllabus
:
git clone --recurse-submodules https://github.com/illinois-mla/syllabus.git
This should ask you for your github username and password (but you can streamline future github access using ssh).
We will use the conda command to create a standard python environment for this course. These instructions assume that you have already satisfied the prerequisites.
Create a new environment by entering (or pasting) the following command at a shell prompt in the top level directory of the course syllabus repo.
conda env create -f DAMLA-env/environment.yml
Activate the new environment using (this should add "(DAMLA)" to your command prompt, as a reminder of your current environment):
source activate DAMLA
Add some additional packages from other sources (details here and here):
conda install -c conda-forge keras libiconv jupyter_contrib_nbextensions
conda install -c astropy emcee astroml
conda install pytorch-cpu -c pytorch
You might see something like
You are using pip version 10.0.1, however version 18.0 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
at which point you should dutifully follow and do
pip install --upgrade pip
Enable a jupyter notebook extension we will use for in-class exercises:
jupyter nbextension enable exercise2/main
Activate the course environment, if necessary (check your command prompt, but it doesn't do any harm to reactivate the current environment):
source activate DAMLA
Install the course code and data using:
cd syllabus
pip install .
To launch the notebook server at any time, you can now use:
[[syllabus]]
source activate DAMLA
cd notebooks
jupyter notebook
Note that [[syllabus]]
is a reminder that you must be in your syllabus
directory before typing the following commands. If you are unsure about this, refer to the pwd
and cd
commands.
Windows users: Wherever you see source activate DAMLA
, use activate DAMLA
instead. Details here.
This should have opened a jupyter notebook tab or window in your browser. If this is your first time doing this, to validate that you can open and view a notebook, do File->Open
and click on Contents.ipynb
. Jupyter notebooks (formerly called IPython notebooks) have the file extension .ipynb
.
(For git experts: you will normally be working on the master branch to simplify the workflow. This means that your local work must be discarded or saved to another branch each time you update, using the instructions below).
In case something goes wrong with your installation and you want to start again, use:
conda remove --name DAMLA --all
You will need to shutdown any jupyter sessions with the old environment first.
You can skip this section if you are installing for the first time, but remember these instructions for later.
The first step is to "factory reset" your installation before getting the updates. The simplest method is to throw away any changes you have made using:
[[syllabus]]
git checkout master
git reset --hard
Alternatively, you can keep a permanent record of your changes in a git branch with a name of your choice, for example "08-Jan-2018":
[[syllabus]]
git checkout -b "08-Jan-2018"
git commit -a -m "Save work in progress"
git checkout master
The second step is to download the changes from github:
[[syllabus]]
git pull
If this commands reports Already up-to-date.
then there are no updates to download.
The final step is to update your local python environment:
[[syllabus]]
source activate DAMLA
pip install . --upgrade
If there is a problem that keeps you from being able to easily install the Conda environment on your local machine you can use the environment on the provided Docker image. The Docker image is meant to provide the compute environment, but not to be used as an area to store your work, so you should still clone the repo down to your local machine.
To install Docker Community Edition on your Linux, Mac, or Windows machine follow the instructions in the Docker docs.
To use the Docker image first pull it down from Docker Hub
docker pull illinoismla/damla-env
If you want anything you do in the container to safely persist then you should bindmount your local machine's file system to the container as a volume. So run the image in a container while exposing the container's internal port 8888
with the -p
flag (this is necessary for Jupyter to be able to talk to the localhost
) and bindmount the directory of the course Git repo on your local machine
docker run --rm -it -v <path to the repo goes here>:/home/physicist/data -p 8888:8888 illinoismla/damla-env:latest
Once inside the container activate note that the DAMLA Conda environment is already activated and should be shown in the terminal prompt
(DAMLA) root@<hostname>:~/data#
though you can also verify this by listing the conda environments
conda env list
# conda environments:
#
base /root/miniconda
DAML
To verify that things are working as expected, launch a Jupyter notebook server and test basic imports (the "Hello World" of data analysis).
Inside the running Docker container with the DAMLA
environment activated navigate to /home/physicist/data
(this is an arbitrary path that we chose when running the Docker image — you can make this whatever you want). If you ls
you will see that you are actually inside your Git repo on your local machine.
Then launch the Jupyter server
jupyter notebook
which will cause a login URL with a token to be printed to your terminal
http://localhost:8888/?token=<token>
Click on the URL, and copy and paste it into your web browser on your local machine. This should then display the Jupyter server in your web browser.
Create a new Jupyter notebook (select from the "New" drop down menu on the upper right) and then when the notebook opens import NumPy and run a simple test
import numpy as np
np.arange(0, 10, 0.5)
array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. , 5.5, 6. , 6.5, 7. , 7.5, 8. , 8.5, 9. , 9.5])
If you now save the notebook and in a different terminal window on your local machine you navigate to the Git repo directory you will now see that on your local machine there is the Jupyter notebook. If you shutdown and exit the Jupter server, and exit the Docker container you will see that though the environment has exited and been cleaned up the notebook has persisted.
There should be no need to reuse the same container, as the thing you care about is the data and files you're writing which should exist on your local machine (nicely versioned with Git). However, if in the event you do want the container to persist between uses you can remove the --rm
flag from the docker run
command to keep the container from being removed. In this situation it is a good idea to also name the container with the --name
flag
docker run --name DAMLA-env-container -it -v <path to the repo goes here>:/home/physicist/data -p 8888:8888 illinoismla/damla-env:latest
After you exit the container if you list the Docker containers on your local machine
docker ps -a
you will see your exited container. To resume using that specific container start it again
docker start DAMLA-env-container
and then attach it to your shell
docker attach DAMLA-env-container
Sometimes seemingly small changes to a Juypter notebook can be hard to discern from large changes using git diff. You can add a tool called nbdime
to your conda enviroment. First activate your DAMLA conda environment and then do:
pip install nbdime
nbdime config-git --enable --global
Then diff'ing notebooks can use a web tool like:
nbdiff-web <notebook1.ipynb> <notebook2.ipynb>
For a complete set of nbdime
commands, see them here