#!/usr/bin/env python # coding: utf-8 # # Data analysis in Python # This course is aimed at the intermediate Python developer who wants to learn how to do useful data analysis tasks in Python. # It will initially focus on the Python package [pandas](http://pandas.pydata.org/) but will also cover [matplotlib](http://matplotlib.org/), [NumPy](http://www.numpy.org/) and [SciPy](https://www.scipy.org/) to some degree. # # Data analysis is a huge topic and we couldn't possibly cover it all in one short course so the purpose of this workshop is to give you an introduction to some of the most useful tools and to demonstrate some of the most common problems that surface. # In previous courses, you've used the `python` command line program to execute scripts and `ipython` to run interactively. This course will use another tool called Jupyter (previously known as IPython Notebook) to run your Python code. It operates like a standard IPython interactive session with the addition of allowing you to intersperse your code with blocks of text to explain what you're doing and embed output such as graphs directly into the page. # To get started, launch Jupyter Notebook as shown by your instructor. Towards the top-right of that page there should be three buttons, the middle of which is labelled *New*. Clicking that gives a drop-down and you should select the *Python 3* option. # # This will open a new notebook file. Give it a name by clicking on the 'Untitled' at the top of the screen. # # Throughout this course you will likely want to start a new notebook for each section of the course so name them appropriately to make it easier to find them later. # ## Getting started # # Once the notebook is launched, you will see a wide white box with a grey box inside it with a blue `In [ ]:` to the left. The grey box is an input cell, similar to that which you find in the IPython command line program. You type any Python code you want to run inside that box: # In[1]: # Python code can be written in 'Code' cells print('Output appears below when the cell is run') print('To run a cell, press Ctrl-Enter with the cursor inside or use the run button in the toolbar at the top') # In your notebook, type the following in the first cell and then run it, you should see the same output: # In[2]: a = 5 b = 7 a + b # The cells in a notebook are linked together so a variable defined in one is available in all the cells from that point on so in the second cell you can use the variables `a` and `b`: # In[3]: c = a - b print(c) # Some Python libraries have special integration with Jupyter notebooks and so can display their output directly into the page. For example `pandas` will format tables of data nicely and `matplotlib` will embed graphs directly: # In[4]: from pandas import DataFrame DataFrame([[1,2,3],[5,6,6]]) # In[5]: get_ipython().run_line_magic('matplotlib', 'inline') import matplotlib.pyplot as plt import numpy as np t = np.arange(0.0, 2.0, 0.01) s = np.sin(2*np.pi*t) plt.plot(t, s) plt.show() # ## Markdown # # If you want to write some text as documentation (like these words here) then you should label the cell as being a Markdown cell. Do that by selecting the cell and going to the dropdown at the top of the page labelled *Code* and changing it to *Markdown*. # # It is becomming common for people to use Jupyter notebooks as a sort of lab notebook where they document their processes, interspersed with code. This style of working where you give prose and code equal weight is sometimes called *literate programming*. # ## Exercise # # Take the following code and break it down, chunk by chunk, interspersing it with documentation explaining what each part does using Markdown blocks: # ```python # prices = {'apple': 0.40, 'banana': 0.50} # my_purchase = { # 'apple': 1, # 'banana': 6 # } # grocery_bill = 0 # for fruit in my_purchase: # grocery_bill += prices[fruit] * my_purchase[fruit] # print('I owe the grocer ${:.2f}'.format(grocery_bill)) # ``` # You don't need to put only one line of code per cell, it makes sense to group some lines together. # Throughout this course, use the Jupyter notebook to solve the problems. Follow along with the examples, typing them into your own notebooks and see how they work. # # Now to move on to the [next page](pandas.ipynb).