This is one of the 100 recipes of the IPython Cookbook, the definitive guide to high-performance scientific computing and data science in Python.
We will download and process a dataset about attendance on Montreal's bicycle tracks. This example is largely inspired by a presentation from Julia Evans.
import numpy as np import pandas as pd import matplotlib.pyplot as plt %matplotlib inline
urlthat contains the address to a CSV (Comma-separated values) data file. This standard text-based file format is used to store tabular data.
url = "https://github.com/ipython-books/cookbook-data/raw/master/bikes.csv"
read_csvfunction that can read any CSV file. Here, we give it the URL to the file. Pandas will automatically download and parse the file, and return a
DataFrameobject. We need to specify a few options to make sure the dates are parsed correctly.
df = pd.read_csv(url, index_col='Date', parse_dates=True, dayfirst=True)
dfvariable contains a
DataFrameobject, a specific Pandas data structure that contains 2D tabular data. The
head(n)method displays the first
nrows of this table.
Every row contains the number of bicycles on every track of the city, for every day of the year.
'PierDup'. Then, we call the
# The styling '-' and '--' is just to make the figure # readable in the black & white printed version of this book. df[['Berri1', 'PierDup']].plot(figsize=(8,4), style=['-', '--']);
indexattribute of the
DataFramecontains the dates of all rows in the table. This index has a few date-related attributes, including
However, we would like to have names (Monday, Tuesday, etc.) instead of numbers between 0 and 6. This can be done easily. First, we create an array
days with all weekday names. Then, we index it by
df.index.weekday. This operation replaces every integer in the index by the corresponding name in
days. The first element,
Monday, has index 0, so every 0 in
df.index.weekday is replaced by
Monday, and so on. We assign this new index to a new column
Weekday in the
days = np.array(['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']) df['Weekday'] = days[df.index.weekday]
groupbymethod lets us do just that. Once grouped, we can sum all rows in every group.
df_week = df.groupby('Weekday').sum()
ix(indexing operation). Then, we plot the table, specifying the line width and the figure size.
df_week.ix[days].plot(lw=3, figsize=(6,4)); plt.ylim(0); # Set the bottom axis to 0.
from ipywidgets import interact #from IPython.html.widgets import interact # IPython < 4.x @interact def plot(n=(1, 30)): plt.figure(figsize=(8,4)); pd.rolling_mean(df['Berri1'], n).dropna().plot(); plt.ylim(0, 8000); plt.show();