#!/usr/bin/env python # coding: utf-8 # # 01 - Introduction to Conda & Jupyter Notebooks #

Table of Contents

#
# The online Notebook Viewer link ([NB Viewer](http://nbviewer.jupyter.org/github/johannesgiorgis/school_of_ai_vancouver/blob/master/intro_to_data_science_tools/introduction_to_data_science_tools_01.ipynb0)) is recommended. # ## Introduction # # Welcome to the School of AI's Introduction to Data Science Tools series. This series will serve as an introduction to the various Data Science tools and ecosystem that are at our disposal. # # We will be covering: # - Conda (package and environment management system) # - Jupyter # - NumPy (Numeric computing) # # This initial workshop will focus on: # - installing conda # - setting up a conda virtual environment # - installing and using Jupyter Notebooks # - using NumPy for Statistics within a Jupyter Notebook # # First, a brief look at the tools we will be learning about today: # # # ### Conda # [Conda](https://conda.io/docs/) is an open source cross platform package and environment management system. Conda allow us to set up separate environments, called virtual environments and helps us manage the necessary packages we need within each environment. # # # ### Jupyter Notebook # [Jupyter Notebook](http://jupyter.org/) is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. It is extremely useful for Data Scientists as it allows them to write down their thoughts and documentation right next to the code and its output. In fact, you are reading this on a Jupyter Notebook :) # # # ### NumPy # [NumPy](http://www.numpy.org/) (Numeric Python) is the fundamental package for scientific computing with Python. It contains among other things: # # - a powerful N-dimensional array object # - sophisticated (broadcasting) functions # - tools for integrating C/C++ and Fortran code # - useful linear algebra, Fourier transform, and random number capabilities # # Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases. # # #
# # ### Sample Jupyter Notebook # # ![jupyter notebook sample](../imgs/jupyter_notebook_sample.jpg) # ## Install Conda # The first step is to install conda: # # 1. Install [Conda](https://conda.io/docs/user-guide/install/index.html) Links: # 1.1. [Windows](https://conda.io/docs/user-guide/install/windows.html) # 1.2. [macOS](https://conda.io/docs/user-guide/install/macos.html) # 1.3. [Linux](https://conda.io/docs/user-guide/install/linux.html) # # ### Installing on Windows # # 1. Download one installer: # - [Miniconda installer for Windows.](https://conda.io/miniconda.html) # - [Anaconda installer for Windows.](https://www.anaconda.com/download/) # # 2. Double-click the ```.exe``` file. # # 3. Follow the instructions on the screen. # If you are unsure about any setting, accept the defaults. You can change them later. # When installation is finished, from the Start menu, open the Anaconda Prompt. # # 4. Test your installation: # Run the command ```conda list``` from the Terminal Window or Anaconda Prompt. # # ### Installing on macOS # # 1. Download one installer: # # - [Miniconda installer for macOS.](https://conda.io/miniconda.html) # - [Anaconda installer for macOS.](https://www.anaconda.com/download/) # # 2. Install: # # - Miniconda — In your Terminal window, run: ```bash Miniconda3-latest-MacOSX-x86_64.sh``` # - Anaconda — Double-click the .pkg file. # # 3. Follow the prompts on the installer screens. # # If you are unsure about any setting, accept the defaults. You can change them later. # # 4. To make the changes take effect, close and then re-open your Terminal window. # # 5. Test your installation: # Run the command ```conda list``` from the Terminal Window or Anaconda Prompt. # # # ### Installing on Linux # # 1. Download one installer: # # - [Miniconda installer for Linux.](https://conda.io/miniconda.html) # - [Anaconda installer for Linux.](https://www.anaconda.com/download/) # # 2. In your Terminal window, run: # # - Miniconda: ```bash Miniconda3-latest-Linux-x86_64.sh``` # # - Anaconda: ```bash Anaconda-latest-Linux-x86_64.sh``` # 3. Follow the prompts on the installer screens. # # If you are unsure about any setting, accept the defaults. You can change them later. # # 4. To make the changes take effect, close and then re-open your Terminal window. # # 5. Test your installation: # Run the command ```conda list``` from the Terminal Window or Anaconda Prompt. # # You should see a list of packages that are currently installed. Don't worry if you don't see anything - we haven't installed anything yet. But you can always use the ```conda list``` command to check the list of installed packages in your current environment. Below is a sample output of this command: # # ![conda list output](../imgs/conda_list_output.jpg) # #
# # Great! Now we are all set up with conda, let's move to the next step. # ## Create a Virtual Environment # # - [What is a Virtual Environment?](#virtual-env) # - [Create School of AI Environment](#create-soai-env) # - [Activate Environment](#activate-env) # # With conda installed and confirmed to be working on our systems, the next step is to create a virtual environment. # # # ### What is a Virtual Environment? # Let's first understand what is a virtual environment and what use we may have for it. # # A virtual environment is an isolated container of dependencies. You create different virtual environments to install different types of dependencies or even different versions of the same dependency. This way, adding or removing dependencies in one virtual environment has zero effect in another virtual environment. Conda allows you to manage - create, export, list, remove and update - environments. # # For example, suppose you wish to work on multiple projects - one project that uses Python 3.6 while another uses Python 3.7. Your Python 3.6 project uses Tensorflow v0.9 and your Python 3.7 project uses Tensorflow v1.11. The APIs between the two Tensorflow versions are very different. Because each conda virtual environment is completely isolated, you can update packages in one virtual environment with no fear of over-writing or breaking packages in any other virtual environment. # # # ### Create School of AI Environment # # To create a conda environment, use one of the following two commands. # # ``` # conda create --name # conda create --name [] # ``` # # Let's create our first conda environment, called **school_of_ai**. The below command will install a new environment called **school_of_ai**. A prompt will ask you whether to proceed with the creation or not. # # ``` # conda create --name school_of_ai # ``` # # ![conda create school_of_ai environment](../imgs/conda_create_school_of_ai_environment.jpg) # #
# # Let's verify we can see our newly created environment via ```conda env list```: # # ![conda list school_of_ai environment](../imgs/conda_list_school_of_ai_environment.jpg) # #
# # # ### Activate Environment # # We have created our environment and confirmed we can see it under the list of conda environments. We need to access it now. # # We access or activate an environment via ```source activate ```. Your terminal will now show the name of the active environment at the beginning of the line in parentheses: # # ![conda activate school_of_ai environment](../imgs/conda_activate_school_of_ai_environment.jpg) # ## Install Packages # # Once we have activated the school_of_ai environment, we can list the packages it contains with ```conda list```: # # ![conda school_of_ai environment list packages (empty)](../imgs/conda_school_of_ai_list_packages_no_packages.jpg) # #
# # As you can see, we have no packages installed within this environment. # # Let's install the following packages: # 1. NumPy (Numeric computing) # 2. Jupyter (REPL) # 3. Pandas (Numeric computing) # 4. Matplotlib (Plotting) # # We can install them via ```conda install numpy jupyter pandas matplotlib```: # #
# # ![conda school_of_ai install packages](../imgs/conda_school_of_ai_install_packages.jpg) # #
# # You should see the following message prompting you to confirm whether it should proceed to install the desired packages and their respective dependencies. As you can see, that entails a lot of additional packages: # # ``` # The following NEW packages will be INSTALLED: # # backcall: 0.1.0-py37_0 # blas: 1.0-mkl # bleach: 3.0.2-py37_0 # ca-certificates: 2018.03.07-0 # certifi: 2018.10.15-py37_0 # ... # traitlets: 4.3.2-py37_0 # wcwidth: 0.1.7-py37_0 # webencodings: 0.5.1-py37_1 # wheel: 0.32.2-py37_0 # widgetsnbextension: 3.4.2-py37_0 # xz: 5.2.4-h14c3975_4 # zeromq: 4.2.5-hf484d3e_1 # zlib: 1.2.11-ha838bed_2 # # Proceed ([y]/n)? # ``` # # Type 'y' and press Enter to allow it to proceed with installing the packages. # #
# # Now that our packages have been installed, let's verify with ```conda list``` command. You should now see a long list of installed packages including the ones we specified: # # ![conda school_of_ai list installed packages](../imgs/conda_school_of_ai_list_installed_packages.jpg) # ## Launch Jupyter Notebook # # - Start Notebook Server # - [Open in Web Browser](#open-in-web) # - [Create Notebook](#create-notebook) # # Now we have created our environment and installed our desired packages including jupyter. It's time to launch Jupyter Notebook! # # # ### Start Notebook Server # # Let's launch the notebook server via ```jupyter notebook```: # # ![launch jupyter notebook](../imgs/conda_school_of_ai_launch_jupyter_notebook.jpg) # #
# # # ### Open in Web Browser # # Your web browser should automatically open to the URL where the notebook server is running. If not, follow the instructions from your terminal and copy the automatically generated URL and paste it into your Web Browser. # # If everything worked, you should see a screen like the one below in your Web Browser. This is the notebook dashboard, the home page for the notebook: # #
# # ![jupyter notebook on browser](../imgs/jupyter_notebook_on_browser.jpg) # # # ### Create Notebook # # Let's open a new Notebook by: # # 1. Selecting the 'New' button in the top right side of the screen. # 2. Selecting the 'Python 3' option which creates a Python 3 notebook and launches it in a new tab. # # ![jupyter notebook create notebook](../imgs/jupyter_notebook_create_new_notebook.jpg) # #
# # You should now see a new Python 3 notebook launched in a new tab: # # ![jupyter notebook new Python 3 notebook](../imgs/jupyter_notebook_new_python3_notebook.jpg) # ## Navigating Jupyter Notebook # # - [Modal Editor](#modal-editor) # - [Mouse Navigation](#mouse-nav) # - [Keyboard Navigation](#keyboard-nav) # # **Source**: [Notebook Basics](https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Notebook%20Basics.html) # # When you create a new notebook or open an existing one, you will be taken to the notebook user interface (UI). This UI allows you to run code and author notebook documents interactively. The notebook UI has the following main areas: # # - Menu # - Toolbar # - Notebook area and cells # # The notebook has an interactive tour of these elements that can be started in the “Help:User Interface Tour” menu item. # # # ### Modal Editor # # Jupyter Notebook has a modal user interface, which means that the keyboard does different things depending on which mode the Notebook is on. The two modes are Edit Mode and Command Mode. # # **Edit Mode**: # # Edit mode is indicated by a green cell border and a prompt showing in the editor area: # # ![jupyter notebook edit mode](../imgs/jupyter_notebook_edit_mode.jpg) # # When a cell is in edit mode, you can type into the cell, like a normal text editor. # # Enter edit mode by pressing ```Enter``` or using the mouse to click on a cell’s editor area. # #
# # **Command Mode**: # # Command mode is indicated by a grey cell border with a blue left margin: # # ![jupyter notebook command mode](../imgs/jupyter_notebook_command_mode.jpg) # # When you are in command mode, you are able to edit the notebook as a whole, but not type into individual cells. Most importantly, in command mode, the keyboard is mapped to a set of shortcuts that let you perform notebook and cell actions efficiently. For example, if you are in command mode and you press c, you will copy the current cell - no modifier is needed. # # Don’t try to type into a cell in command mode; unexpected things will happen! # # Enter command mode by pressing ```Esc``` or using the mouse to click outside a cell’s editor area. # #
# # ### Mouse Navigation # # All navigation and actions in the Notebook are available using the mouse through the menubar and toolbar, which are both above the main Notebook area: # # ![Jupyter Notebook menus and toolbar](../imgs/jupyter_notebook_menus_and_toolbars.jpg) # # There a couple of types of mouse based navigation available on Jupyter Notebook: # # - cells can be selected by clicking on them # - cell actions usually apply to the currently selected cell # #
# # ### Keyboard Navigation # # The modal user interface of the Jupyter Notebook has been optimized for efficient keyboard usage. This is made possible by having two different sets of keyboard shortcuts: one set that is active in edit mode and another in command mode. # # Below are the recommended learning command mode shortcuts: # # - Basic navigation # - Saving the notebook # - Creating a cell # - Editing a cell # # # | Button | Command | # | :------: | :-------: | # | Enter | Enter 'edit mode' from 'command mode' | # | Esc | Enter 'command mode' from 'edit mode' | # | Shift-Enter | Execute cell | # | Up or k | Go up a cell in 'command mode' | # | Down or j | Go down a cell in 'command mode' | # | s | Save the notebook when in 'command mode' | # | y | Change cell to 'code' cell | # | m | Change cell to 'Markdown' cell | # | 1-6 | Change cell to 'Markdown' cell and add appropriate h# header | # | a | Create a cell above the current cell | # | b | Create a cell below the current cell | # | x | Cut current cell | # | c | Copy current cell | # | v | Paste current cell | # | d (twice) | Delete current cell | # | z | Undo last cell operation | # # #
# # For more details on Keyboard Shortcuts and other Help, the Help Menu has lots of options: # # ![jupyter notebook help menu](../imgs/jupyter_notebook_help_menu.jpg) # ## Statistics Exercise # # In a new Browser Tab, please open Twilio's [Basic Statistics in Python with NumPy and Jupyter Notebook](https://www.twilio.com/blog/2017/10/basic-statistics-python-numpy-jupyter-notebook.html) blog to work through some Statistics using Jupyter Notebook. # # # In[1]: import numpy as np # ### Mean # # The equation for the mean, $\mu$ is: # # $$ \huge \mu = \frac{1}{n} \sum_{i} x_{i} \\ $$ # In[13]: def calculate_mean(a_list): return (float(sum(a_list)) / len(a_list)) my_list = [1,4,3,2,6,4,4,3,2,6] print(f"The mean using calculate_mean() is {calculate_mean(my_list)}") print(f"The mean using np.mean() is {np.mean(my_list)}") # ### Variance # # The equation for the variance, $\sigma^{2}$ is: # # $$ \huge \sigma^{2} = \frac{1}{n} \sum_{i}(x_{i} - \mu)^{2} \\ $$ # In[14]: def calculate_variance(a_list, mu): dev2 = [(x - mu)**2 for x in a_list] variance = mean(dev2) return variance my_list = [1,3,3,6,3,2,7,5,9,1] variance = calculate_variance(my_list, calculate_mean(my_list)) print(f"The mean using calculate_variance() is {variance}") print(f"The mean using np.var() is {np.var(my_list)}") # ## The End # # To recap, in this presentation we covered: # # - the importance of conda and virtual environments # - creating a virtual environment and installing packages # - launching and creating a jupyter notebook # - using NumPy for some basic statistics within a Jupyter Notebook # # If you liked this presentation and you'd like to see more, please subscribe and donate to this [bitcoin wallet](https://www.youtube.com/watch?v=kHBcVlqpvZ8&feature=youtu.be) :) # # Thanks to [Andy](https://github.com/redpanda-ai), Dean of School of AI San Jose Chapter for reviewing and editing this notebook! # ## Tips & Tricks # # **List Environments**: # # To list your virtual environments, you can use either of the following commands: # # - ```conda env list``` # # - ```conda info envs``` # #
# # **Activate and Deactivate Environment**: # # To enter/activate an environment: # # - ```source activate ``` # # To leave/deactivate an environment you are in: # # - ```source deactivate``` # #
# # **List Packages**: # # When you are in a virtual environment, you can list the installed conda packages via: # # - ```conda list``` # # If you used pip to install a package within a conda environment, you can use: # # - ```pip list``` # ## Resources # - [Blog: Creating Conda Environments](https://dziganto.github.io/data%20science/python/anaconda/Creating-Conda-Environments/) # - [Conda Concepts](https://conda.io/docs/user-guide/concepts.html) # - [Conda: Managing environments](https://conda.io/docs/user-guide/tasks/manage-environments.html) # - [Github repo: Intro 2 Stats](https://github.com/rouseguy/intro2stats) # - [Twilio Blog: Basic Statistics in Python with NumPy and Jupyter Notebook](https://www.twilio.com/blog/2017/10/basic-statistics-python-numpy-jupyter-notebook.html) # - [Setting up your machine for data science in Python](https://drivendata.github.io/pydata-setup/) # - [Jupyter Notebook Tutorial: The Definitive Guide](https://www.datacamp.com/community/tutorials/tutorial-jupyter-notebook) # - [Why you need Python environments and how to manage them with Conda](https://medium.freecodecamp.org/why-you-need-python-environments-and-how-to-manage-them-with-conda-85f155f4353c) # - [Jupyter: Notebook Basics](https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Notebook%20Basics.html)