Code is good, and good reproducible code is better!

In this first lesson, we will see how we can use notebooks to write code and give it context with some text, examine the results, and then export it. It can help to think of a computational notebook as an actual, paper-based notebook. You write a bit of text, then an equation, then you solve this equation, and then write some more text. A notebook does this too, but you can re-execute any part at any moment.

See the discussions on this lesson on github.

What to expect after reading this lesson

  • Understanding of the key concepts of a notebook
  • Understanding the order of execution
  • Using checkpoints
  • Exporting the notebook
  • Closing the notebook and halting/restarting the kernel

Core notebook concepts

Let's dig in. Just below this text is a cell: you can clik in it, then click on Cell > Run cell in the top menu (or use the keyboard shortcut Ctrl+Enter to run it). Let's try!

In [ ]:
2+1

A few things happenned. Before running it, the cell contained:

In [ ]: 2+1

But after running it, it changed to:

In [1]: 2+1
Out[1]: 3

Note that the number in brackets may vary - we will discuss why very soon. The new line (starting with Out[1]:) means that the cell was succesfully executed by the notebook: something behind the scenes (called a kernel) took the code, ran it, and then gave you the input. Notebooks let you see the output of a command immediately after writing it!

The most important unit in a notebook is the cell: some cells (like this one!) contain text, and the others contain code, that will be executed, and generate a result (it can be text, a table, or a figure).

Cells can have different types, which are noted in the toolbar next to the keyboard icon. In the screenshot just below, this is a Markdown cell:

Screenshot-2017-10-20%2001_introduction_to_notebooks.png

Markdown is a very powerful yet simple way to format text. There is an excellent interactive tutorial you can follow to get up to speed. Markdown is great because it also lets you write mathematics using the LaTeX language. There is a very good introduction on WikiBooks.

But enough about text, it's time to get back to writing code. We can write instructions that are as long as we want in a cell. Let's try to run this one:

In [ ]:
i = 2
i

Notice the number after In and Out? This conveys an information about the state: it lets you know when the cell was last ran. Compare the number of this cell, to the one from the very first cell. It should be greater: we executed this cell more recently than the previous one.

Go back to the very first cell in the lesson. Before you execute it, what do you think will happen? Now run it - what did happen?

Understanding the order of execution

One peculiar behavior of notebooks is that the variables are shared between all cells. The cell immediately below has only

In [ ]: i

in it. If you run it, what do you think will happen?

In [ ]:
i

It turned into

In [3]: i
Out[3]: 2

This is because in the previous cell, we assigned the value 2 to the variable i, and this variable can be accessed by any cell. It means that the order in which cells are executed is very important. Notebooks are designed to be executed in a linear way, from the top to the bottom. This is why the number in brackest, right next to In or Out, is so important. Good notebook hygiene means that the numbers will start at 1 (at the top), and then increase from here.

To demonstrate the potential issue, let's look at the cell just below: it contains the instruction i = 2*i, which means multiply i by two, and this becomes the new value of i. What happens if you run this cell several times? Try it!

In [ ]:
i = 2*i
i

If the cells are executed multiple times, or in random order, the results can change. Throughout these lessons, we will always assume that cells are executed in the order where they appear. You are free to experiment with running cells in different orders. In fact, this is not a problem, since we will see in the next section how to reset a notebook to where it was originally, or just moments ago!

Back to square one

Thankfully, it is possible to reset a notebook, to start again with a blank slate.

The first way to do so is through the menu in Cells > All output > Clear. This will remove all states, but will not reset the counter at 1. This is because the counter tracks the number of operations since the kernel was started.

The other way is to restart the kernel. This is done through the menu, in Kernel > Restart & Clear output (will give you the notebook as it was when you started), or Kernel > Restart & Run all (will restart, and then execute the content of all the cells in order).

Checkpoints

One final important point about notebooks is the notion of checkpoints. A checkpoint is a temporary copy of your notebook and its state. You can create a checkpoint through the File > Save and Checkpoint menu, or by clicking the floppy disk icon.

Checkpoints let you experiment a little bit, as you can restore your work to the previous checkpoint. This is done with the File > Restore to checkpoint menu, which will give the option to return to the last checkpoint. A good practice is to checkpoint before you try something, and go back to the checkpoint when this thing is not working. When it finally works (don't give up, it will eventually!), you can create a new checkpoint. Note that checkpoints are temporary -- they are not preserved between sessions.

Saving your work and quitting

By default, the notebook will save the output of all cells that have been executed. You may want to export this into another format. The File > Download as menu has different options. The markdown and html format are guaranteed to work. If you have pandoc installed (which may be the case if you use RStudio, you can also create a PDF.

Once you are done working on a notebook, you can use the File > Close and Halt menu. This will stop the kernel from running, and close the notebook.

A few parting words

Remember the keyboard icon we mentionned earlier? It has a list of all commands you can use! Notebooks also have a fair amount of keyboard shortcuts you can use. Just click anywhere outside of a cell, and press h: you will see a list of them. The Jupyter project maintains a very good documentation of the notebook and its UI components, which is worth keeping as a bookmark.