Introduction

IPython, pandas and matplotlib have a number of useful options you can use to make it easier to view and format your data. This notebook collects a bunch of them in one place. I hope this will be a useful reference.

The original blog posting is on http://pbpython.com/ipython-pandas-display-tips.html

Import modules and some sample data

First, do our standard pandas, numpy and matplotlib imports as well as configure inline displays of plots.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

One of the simple things we can do is override the default CSS to customize our DataFrame output.

This specific example is from - Brandon Rhodes' talk at pycon

For the purposes of the notebook, I'm defining CSS as a variable but you could easily read in from a file as well.

In [2]:
CSS = """
body {
    margin: 0;
    font-family: Helvetica;
}
table.dataframe {
    border-collapse: collapse;
    border: none;
}
table.dataframe tr {
    border: none;
}
table.dataframe td, table.dataframe th {
    margin: 0;
    border: 1px solid white;
    padding-left: 0.25em;
    padding-right: 0.25em;
}
table.dataframe th:not(:empty) {
    background-color: #fec;
    text-align: left;
    font-weight: normal;
}
table.dataframe tr:nth-child(2) th:empty {
    border-left: none;
    border-right: 1px dashed #888;
}
table.dataframe td {
    border: 2px solid #ccf;
    background-color: #f4f4ff;
}
"""

Now add this CSS into the current notebook's HTML.

In [3]:
from IPython.core.display import HTML
HTML('<style>{}</style>'.format(CSS))
Out[3]:
In [4]:
SALES=pd.read_csv("sample-sales-tax.csv", parse_dates='True')
SALES.head()
Out[4]:
account number name sku category quantity unit price ext price Tax rate Tax amount date
0 296809 Carroll PLC QN-82852 Belt 13 44.48 578.24 0.07 40.4768 2014-09-27 07:13:03
1 98022 Heidenreich-Bosco MJ-21460 Shoes 19 53.62 1018.78 0.07 71.3146 2014-07-29 02:10:44
2 563905 Kerluke, Reilly and Bechtelar AS-93055 Shirt 12 24.16 289.92 0.07 20.2944 2014-03-01 10:51:24
3 93356 Waters-Walker AS-93055 Shirt 5 82.68 413.40 0.07 28.9380 2013-11-17 20:41:11
4 659366 Waelchi-Fahey AS-93055 Shirt 18 99.64 1793.52 0.07 125.5464 2014-01-03 08:14:27

You can see how the CSS is now applied to the DataFrame and how you could easily modify it to customize it to your liking.

Jupyter notebooks do a good job of automatically displaying information but sometimes you want to force data to display. Fortunately, ipython provides and option. This is especially useful if you want to display multiple dataframes.

In [5]:
from IPython.display import display
In [6]:
display(SALES.head(2))
display(SALES.tail(2))
display(SALES.describe())
account number name sku category quantity unit price ext price Tax rate Tax amount date
0 296809 Carroll PLC QN-82852 Belt 13 44.48 578.24 0.07 40.4768 2014-09-27 07:13:03
1 98022 Heidenreich-Bosco MJ-21460 Shoes 19 53.62 1018.78 0.07 71.3146 2014-07-29 02:10:44
account number name sku category quantity unit price ext price Tax rate Tax amount date
998 304860 Huel-Haag LL-46261 Shoes 9 98.22 883.98 0.07 61.8786 2014-07-26 01:10:57
999 98022 Heidenreich-Bosco LW-86841 Shoes 14 74.83 1047.62 0.07 73.3334 2014-06-27 05:58:33
account number quantity unit price ext price Tax rate Tax amount
count 1000.000000 1000.000000 1000.000000 1000.00000 1000.000000 1000.000000
mean 535208.897000 10.328000 56.179630 579.84390 0.035863 20.058881
std 277589.746014 5.687597 25.331939 435.30381 0.025598 22.463583
min 93356.000000 1.000000 10.060000 10.38000 0.003000 0.202050
25% 299771.000000 5.750000 35.995000 232.60500 0.015000 5.327175
50% 563905.000000 10.000000 56.765000 471.72000 0.030000 13.169800
75% 750461.000000 15.000000 76.802500 878.13750 0.040000 26.302725
max 995267.000000 20.000000 99.970000 1994.80000 0.150000 191.250000

Using pandas settings to control output

Pandas has many different options to control how data is displayed.

You can use max_rows to control how many rows are displayed

In [7]:
pd.set_option("display.max_rows",4)
In [8]:
SALES
Out[8]:
account number name sku category quantity unit price ext price Tax rate Tax amount date
0 296809 Carroll PLC QN-82852 Belt 13 44.48 578.24 0.07 40.4768 2014-09-27 07:13:03
1 98022 Heidenreich-Bosco MJ-21460 Shoes 19 53.62 1018.78 0.07 71.3146 2014-07-29 02:10:44
... ... ... ... ... ... ... ... ... ... ...
998 304860 Huel-Haag LL-46261 Shoes 9 98.22 883.98 0.07 61.8786 2014-07-26 01:10:57
999 98022 Heidenreich-Bosco LW-86841 Shoes 14 74.83 1047.62 0.07 73.3334 2014-06-27 05:58:33

1000 rows × 10 columns

Depending on the data set, you may only want to display a smaller number of columns.

In [9]:
pd.set_option("display.max_columns",6)
In [10]:
SALES
Out[10]:
account number name sku ... Tax rate Tax amount date
0 296809 Carroll PLC QN-82852 ... 0.07 40.4768 2014-09-27 07:13:03
1 98022 Heidenreich-Bosco MJ-21460 ... 0.07 71.3146 2014-07-29 02:10:44
... ... ... ... ... ... ... ...
998 304860 Huel-Haag LL-46261 ... 0.07 61.8786 2014-07-26 01:10:57
999 98022 Heidenreich-Bosco LW-86841 ... 0.07 73.3334 2014-06-27 05:58:33

1000 rows × 10 columns

You can control how many decimal points of precision to display

In [11]:
pd.set_option('precision',2)
In [12]:
SALES
Out[12]:
account number name sku ... Tax rate Tax amount date
0 296809 Carroll PLC QN-82852 ... 0.1 40.5 2014-09-27 07:13:03
1 98022 Heidenreich-Bosco MJ-21460 ... 0.1 71.3 2014-07-29 02:10:44
... ... ... ... ... ... ... ...
998 304860 Huel-Haag LL-46261 ... 0.1 61.9 2014-07-26 01:10:57
999 98022 Heidenreich-Bosco LW-86841 ... 0.1 73.3 2014-06-27 05:58:33

1000 rows × 10 columns

In [13]:
pd.set_option('precision',7)
In [14]:
SALES
Out[14]:
account number name sku ... Tax rate Tax amount date
0 296809 Carroll PLC QN-82852 ... 0.07 40.4768 2014-09-27 07:13:03
1 98022 Heidenreich-Bosco MJ-21460 ... 0.07 71.3146 2014-07-29 02:10:44
... ... ... ... ... ... ... ...
998 304860 Huel-Haag LL-46261 ... 0.07 61.8786 2014-07-26 01:10:57
999 98022 Heidenreich-Bosco LW-86841 ... 0.07 73.3334 2014-06-27 05:58:33

1000 rows × 10 columns

You can also format floating point numbers using float_format

In [15]:
pd.set_option('float_format', '{:.2f}'.format)
In [16]:
SALES
Out[16]:
account number name sku ... Tax rate Tax amount date
0 296809 Carroll PLC QN-82852 ... 0.07 40.48 2014-09-27 07:13:03
1 98022 Heidenreich-Bosco MJ-21460 ... 0.07 71.31 2014-07-29 02:10:44
... ... ... ... ... ... ... ...
998 304860 Huel-Haag LL-46261 ... 0.07 61.88 2014-07-26 01:10:57
999 98022 Heidenreich-Bosco LW-86841 ... 0.07 73.33 2014-06-27 05:58:33

1000 rows × 10 columns

This does apply to all the data. In our example, applying dollar signs to everything would not be correct for this example.

In [17]:
pd.set_option('float_format', '${:.2f}'.format)
In [18]:
SALES
Out[18]:
account number name sku ... Tax rate Tax amount date
0 296809 Carroll PLC QN-82852 ... $0.07 $40.48 2014-09-27 07:13:03
1 98022 Heidenreich-Bosco MJ-21460 ... $0.07 $71.31 2014-07-29 02:10:44
... ... ... ... ... ... ... ...
998 304860 Huel-Haag LL-46261 ... $0.07 $61.88 2014-07-26 01:10:57
999 98022 Heidenreich-Bosco LW-86841 ... $0.07 $73.33 2014-06-27 05:58:33

1000 rows × 10 columns

Third Party Plugins

Qtopian has a useful plugin called qgrid - https://github.com/quantopian/qgrid

Import it and install it.

In [19]:
import qgrid
qgrid.nbinstall()

Showing the data is straighforward.

In [22]:
qgrid.show_grid(SALES, remote_js=True)

The plugin is very similar to the capability of an Excel autofilter. It can be handy to quickly filter and sort your data.

Improving your plots

I have mentioned before how the default pandas plots don't look so great. Fortunately, there are style sheets in matplotlib which go a long way towards improving the visualization of your data.

Here is a simple plot with the default values.

In [23]:
SALES.groupby('name')['quantity'].sum().plot(kind="bar")
Out[23]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fccbf583750>

We can use some of the matplolib styles available to us to make this look better. http://matplotlib.org/users/style_sheets.html

In [24]:
plt.style.use('ggplot')
In [25]:
SALES.groupby('name')['quantity'].sum().plot(kind="bar")
Out[25]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fccbf509fd0>

You can see all the styles available

In [26]:
plt.style.available
Out[26]:
[u'dark_background', u'bmh', u'grayscale', u'ggplot', u'fivethirtyeight']
In [27]:
plt.style.use('bmh')
In [28]:
SALES.groupby('name')['quantity'].sum().plot(kind="bar")
Out[28]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fccbf4a6810>
In [29]:
plt.style.use('fivethirtyeight')
In [30]:
SALES.groupby('name')['quantity'].sum().plot(kind="bar")
Out[30]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fccbd840f90>

Each of the different styles have subtle (and not so subtle) changes. Fortunately it is easy to experiment with them and your own plots.

You can find other articles at Practical Business Python

This notebook is referenced in the following post - http://pbpython.com/ipython-pandas-display-tips.html