01 - Introduction to Conda & Jupyter Notebooks

The online Notebook Viewer link (NB Viewer) is recommended.

Introduction

Welcome to the School of AI's Introduction to Data Science Tools series. This series will serve as an introduction to the various Data Science tools and ecosystem that are at our disposal.

We will be covering:

- Conda (package and environment management system)
- Jupyter
- NumPy (Numeric computing)

This initial workshop will focus on:

- installing conda
- setting up a conda virtual environment
- installing and using Jupyter Notebooks
- using NumPy for Statistics within a Jupyter Notebook

First, a brief look at the tools we will be learning about today:

Conda

Conda is an open source cross platform package and environment management system. Conda allow us to set up separate environments, called virtual environments and helps us manage the necessary packages we need within each environment.

Jupyter Notebook

Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. It is extremely useful for Data Scientists as it allows them to write down their thoughts and documentation right next to the code and its output. In fact, you are reading this on a Jupyter Notebook :)

NumPy

NumPy (Numeric Python) is the fundamental package for scientific computing with Python. It contains among other things:

- a powerful N-dimensional array object
- sophisticated (broadcasting) functions
- tools for integrating C/C++ and Fortran code
- useful linear algebra, Fourier transform, and random number capabilities

Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.


Sample Jupyter Notebook

jupyter notebook sample

Install Conda

The first step is to install conda:

  1. Install Conda Links:
    1.1. Windows
    1.2. macOS
    1.3. Linux

Installing on Windows

  1. Download one installer:

  2. Double-click the .exe file.

  3. Follow the instructions on the screen. If you are unsure about any setting, accept the defaults. You can change them later.
    When installation is finished, from the Start menu, open the Anaconda Prompt.

  4. Test your installation:
    Run the command conda list from the Terminal Window or Anaconda Prompt.

Installing on macOS

  1. Download one installer:

  2. Install:

    • Miniconda — In your Terminal window, run: bash Miniconda3-latest-MacOSX-x86_64.sh
    • Anaconda — Double-click the .pkg file.
  3. Follow the prompts on the installer screens.

    If you are unsure about any setting, accept the defaults. You can change them later.

  4. To make the changes take effect, close and then re-open your Terminal window.

  5. Test your installation:
    Run the command conda list from the Terminal Window or Anaconda Prompt.

Installing on Linux

  1. Download one installer:

  2. In your Terminal window, run:

    • Miniconda: bash Miniconda3-latest-Linux-x86_64.sh

    • Anaconda: bash Anaconda-latest-Linux-x86_64.sh

  3. Follow the prompts on the installer screens.

    If you are unsure about any setting, accept the defaults. You can change them later.

  4. To make the changes take effect, close and then re-open your Terminal window.

  5. Test your installation:
    Run the command conda list from the Terminal Window or Anaconda Prompt.

You should see a list of packages that are currently installed. Don't worry if you don't see anything - we haven't installed anything yet. But you can always use the conda list command to check the list of installed packages in your current environment. Below is a sample output of this command:

conda list output


Great! Now we are all set up with conda, let's move to the next step.

Create a Virtual Environment

With conda installed and confirmed to be working on our systems, the next step is to create a virtual environment.

What is a Virtual Environment?

Let's first understand what is a virtual environment and what use we may have for it.

A virtual environment is an isolated container of dependencies. You create different virtual environments to install different types of dependencies or even different versions of the same dependency. This way, adding or removing dependencies in one virtual environment has zero effect in another virtual environment. Conda allows you to manage - create, export, list, remove and update - environments.

For example, suppose you wish to work on multiple projects - one project that uses Python 3.6 while another uses Python 3.7. Your Python 3.6 project uses Tensorflow v0.9 and your Python 3.7 project uses Tensorflow v1.11. The APIs between the two Tensorflow versions are very different. Because each conda virtual environment is completely isolated, you can update packages in one virtual environment with no fear of over-writing or breaking packages in any other virtual environment.

Create School of AI Environment

To create a conda environment, use one of the following two commands.

conda create --name <environment_name>
conda create --name <environment_name> [<packages we want to install>]

Let's create our first conda environment, called school_of_ai. The below command will install a new environment called school_of_ai. A prompt will ask you whether to proceed with the creation or not.

conda create --name school_of_ai

conda create school_of_ai environment


Let's verify we can see our newly created environment via conda env list:

conda list school_of_ai environment


Activate Environment

We have created our environment and confirmed we can see it under the list of conda environments. We need to access it now.

We access or activate an environment via source activate <environment_name>. Your terminal will now show the name of the active environment at the beginning of the line in parentheses:

conda activate school_of_ai environment

Install Packages

Once we have activated the school_of_ai environment, we can list the packages it contains with conda list:

conda school_of_ai environment list packages (empty)


As you can see, we have no packages installed within this environment.

Let's install the following packages:

1. NumPy (Numeric computing)
2. Jupyter (REPL)
3. Pandas (Numeric computing)
4. Matplotlib (Plotting)

We can install them via conda install numpy jupyter pandas matplotlib:


conda school_of_ai install packages


You should see the following message prompting you to confirm whether it should proceed to install the desired packages and their respective dependencies. As you can see, that entails a lot of additional packages:

The following NEW packages will be INSTALLED:

    backcall:           0.1.0-py37_0
    blas:               1.0-mkl
    bleach:             3.0.2-py37_0
    ca-certificates:    2018.03.07-0
    certifi:            2018.10.15-py37_0
    ...
    traitlets:          4.3.2-py37_0
    wcwidth:            0.1.7-py37_0
    webencodings:       0.5.1-py37_1
    wheel:              0.32.2-py37_0
    widgetsnbextension: 3.4.2-py37_0
    xz:                 5.2.4-h14c3975_4
    zeromq:             4.2.5-hf484d3e_1
    zlib:               1.2.11-ha838bed_2

Proceed ([y]/n)?

Type 'y' and press Enter to allow it to proceed with installing the packages.


Now that our packages have been installed, let's verify with conda list command. You should now see a long list of installed packages including the ones we specified:

conda school_of_ai list installed packages

Launch Jupyter Notebook

Now we have created our environment and installed our desired packages including jupyter. It's time to launch Jupyter Notebook!

Start Notebook Server

Let's launch the notebook server via jupyter notebook:

launch jupyter notebook


Open in Web Browser

Your web browser should automatically open to the URL where the notebook server is running. If not, follow the instructions from your terminal and copy the automatically generated URL and paste it into your Web Browser.

If everything worked, you should see a screen like the one below in your Web Browser. This is the notebook dashboard, the home page for the notebook:


jupyter notebook on browser

Create Notebook

Let's open a new Notebook by:

1. Selecting the 'New' button in the top right side of the screen.
2. Selecting the 'Python 3' option which creates a Python 3 notebook and launches it in a new tab.

jupyter notebook create notebook


You should now see a new Python 3 notebook launched in a new tab:

jupyter notebook new Python 3 notebook

Source: Notebook Basics

When you create a new notebook or open an existing one, you will be taken to the notebook user interface (UI). This UI allows you to run code and author notebook documents interactively. The notebook UI has the following main areas:

- Menu
- Toolbar
- Notebook area and cells

The notebook has an interactive tour of these elements that can be started in the “Help:User Interface Tour” menu item.

Jupyter Notebook has a modal user interface, which means that the keyboard does different things depending on which mode the Notebook is on. The two modes are Edit Mode and Command Mode.

Edit Mode:

Edit mode is indicated by a green cell border and a prompt showing in the editor area:

jupyter notebook edit mode

When a cell is in edit mode, you can type into the cell, like a normal text editor.

Enter edit mode by pressing Enter or using the mouse to click on a cell’s editor area.


Command Mode:

Command mode is indicated by a grey cell border with a blue left margin:

jupyter notebook command mode

When you are in command mode, you are able to edit the notebook as a whole, but not type into individual cells. Most importantly, in command mode, the keyboard is mapped to a set of shortcuts that let you perform notebook and cell actions efficiently. For example, if you are in command mode and you press c, you will copy the current cell - no modifier is needed.

Don’t try to type into a cell in command mode; unexpected things will happen!

Enter command mode by pressing Esc or using the mouse to click outside a cell’s editor area.


Mouse Navigation

All navigation and actions in the Notebook are available using the mouse through the menubar and toolbar, which are both above the main Notebook area:

Jupyter Notebook menus and toolbar

There a couple of types of mouse based navigation available on Jupyter Notebook:

- cells can be selected by clicking on them
- cell actions usually apply to the currently selected cell


Keyboard Navigation

The modal user interface of the Jupyter Notebook has been optimized for efficient keyboard usage. This is made possible by having two different sets of keyboard shortcuts: one set that is active in edit mode and another in command mode.

Below are the recommended learning command mode shortcuts:

  • Basic navigation
  • Saving the notebook
  • Creating a cell
  • Editing a cell
Button Command
Enter Enter 'edit mode' from 'command mode'
Esc Enter 'command mode' from 'edit mode'
Shift-Enter Execute cell
Up or k Go up a cell in 'command mode'
Down or j Go down a cell in 'command mode'
s Save the notebook when in 'command mode'
y Change cell to 'code' cell
m Change cell to 'Markdown' cell
1-6 Change cell to 'Markdown' cell and add appropriate h# header
a Create a cell above the current cell
b Create a cell below the current cell
x Cut current cell
c Copy current cell
v Paste current cell
d (twice) Delete current cell
z Undo last cell operation


For more details on Keyboard Shortcuts and other Help, the Help Menu has lots of options:

jupyter notebook help menu

Statistics Exercise

In a new Browser Tab, please open Twilio's Basic Statistics in Python with NumPy and Jupyter Notebook blog to work through some Statistics using Jupyter Notebook.

In [1]:
import numpy as np

Mean

The equation for the mean, $\mu$ is:

$$ \huge \mu = \frac{1}{n} \sum_{i} x_{i} \\ $$
In [13]:
def calculate_mean(a_list):
    return (float(sum(a_list)) / len(a_list))

my_list = [1,4,3,2,6,4,4,3,2,6]

print(f"The mean using calculate_mean() is {calculate_mean(my_list)}")
print(f"The mean using np.mean() is {np.mean(my_list)}")
The mean using calculate_mean() is 3.5
The mean using np.mean() is 3.5

Variance

The equation for the variance, $\sigma^{2}$ is:

$$ \huge \sigma^{2} = \frac{1}{n} \sum_{i}(x_{i} - \mu)^{2} \\ $$
In [14]:
def calculate_variance(a_list, mu):
    dev2 = [(x - mu)**2 for x in a_list]
    variance = mean(dev2)
    return variance

my_list = [1,3,3,6,3,2,7,5,9,1]
variance = calculate_variance(my_list, calculate_mean(my_list))

print(f"The mean using calculate_variance() is {variance}")
print(f"The mean using np.var() is {np.var(my_list)}")
The mean using calculate_variance() is 6.4
The mean using np.var() is 6.4

The End

To recap, in this presentation we covered:

- the importance of conda and virtual environments
- creating a virtual environment and installing packages
- launching and creating a jupyter notebook
- using NumPy for some basic statistics within a Jupyter Notebook

If you liked this presentation and you'd like to see more, please subscribe and donate to this bitcoin wallet :)

Thanks to Andy, Dean of School of AI San Jose Chapter for reviewing and editing this notebook!

Tips & Tricks

List Environments:

To list your virtual environments, you can use either of the following commands:

  • conda env list

  • conda info envs


Activate and Deactivate Environment:

To enter/activate an environment:

  • source activate <environment_name>

To leave/deactivate an environment you are in:

  • source deactivate


List Packages:

When you are in a virtual environment, you can list the installed conda packages via:

  • conda list

If you used pip to install a package within a conda environment, you can use:

  • pip list