Introduction to Pandas Style API

Content to accompany blog post on Practical Business Python

In [1]:
import numpy as np
import pandas as pd
from sparklines import sparklines
In [2]:
df = pd.read_excel('https://github.com/chris1610/pbpython/blob/master/data/2018_Sales_Total.xlsx?raw=true')
In [3]:
df.head()
Out[3]:
account number name sku quantity unit price ext price date
0 740150 Barton LLC B1-20000 39 86.69 3380.91 2018-01-01 07:21:51
1 714466 Trantow-Barrows S2-77896 -1 63.16 -63.16 2018-01-01 10:00:47
2 218895 Kulas Inc B1-69924 23 90.70 2086.10 2018-01-01 13:24:58
3 307599 Kassulke, Ondricka and Metz S1-65481 41 21.05 863.05 2018-01-01 15:05:22
4 412290 Jerde-Hilpert S2-34077 6 83.21 499.26 2018-01-01 23:26:55

Do a simple groupby to look at the performance by customr

In [4]:
df.groupby('name')['ext price'].agg(['mean', 'sum'])
Out[4]:
mean sum
name
Barton LLC 1334.615854 109438.50
Cronin, Oberbrunner and Spencer 1339.321642 89734.55
Frami, Hills and Schmidt 1438.466528 103569.59
Fritsch, Russel and Anderson 1385.366790 112214.71
Halvorson, Crona and Champlin 1206.971724 70004.36
Herman LLC 1336.532258 82865.00
Jerde-Hilpert 1265.072247 112591.43
Kassulke, Ondricka and Metz 1350.797969 86451.07
Keeling LLC 1363.977027 100934.30
Kiehn-Spinka 1260.870506 99608.77
Koepp Ltd 1264.152927 103660.54
Kuhn-Gusikowski 1247.866849 91094.28
Kulas Inc 1461.191064 137351.96
Pollich LLC 1196.536712 87347.18
Purdy-Kunde 1469.777547 77898.21
Sanford and Sons 1391.872958 98822.98
Stokes LLC 1271.332222 91535.92
Trantow-Barrows 1312.567872 123381.38
White-Trantow 1579.558023 135841.99
Will LLC 1411.318919 104437.60

Style the currency using python's string formatting

In [5]:
(df.groupby('name')['ext price']
 .agg(['mean', 'sum'])
 .style.format('${0:,.2f}'))
Out[5]:
mean sum
name
Barton LLC $1,334.62 $109,438.50
Cronin, Oberbrunner and Spencer $1,339.32 $89,734.55
Frami, Hills and Schmidt $1,438.47 $103,569.59
Fritsch, Russel and Anderson $1,385.37 $112,214.71
Halvorson, Crona and Champlin $1,206.97 $70,004.36
Herman LLC $1,336.53 $82,865.00
Jerde-Hilpert $1,265.07 $112,591.43
Kassulke, Ondricka and Metz $1,350.80 $86,451.07
Keeling LLC $1,363.98 $100,934.30
Kiehn-Spinka $1,260.87 $99,608.77
Koepp Ltd $1,264.15 $103,660.54
Kuhn-Gusikowski $1,247.87 $91,094.28
Kulas Inc $1,461.19 $137,351.96
Pollich LLC $1,196.54 $87,347.18
Purdy-Kunde $1,469.78 $77,898.21
Sanford and Sons $1,391.87 $98,822.98
Stokes LLC $1,271.33 $91,535.92
Trantow-Barrows $1,312.57 $123,381.38
White-Trantow $1,579.56 $135,841.99
Will LLC $1,411.32 $104,437.60

Round the results to 0 decimals

In [6]:
(df.groupby('name')['ext price']
 .agg(['mean', 'sum'])
 .style.format('${0:,.0f}'))
Out[6]:
mean sum
name
Barton LLC $1,335 $109,438
Cronin, Oberbrunner and Spencer $1,339 $89,735
Frami, Hills and Schmidt $1,438 $103,570
Fritsch, Russel and Anderson $1,385 $112,215
Halvorson, Crona and Champlin $1,207 $70,004
Herman LLC $1,337 $82,865
Jerde-Hilpert $1,265 $112,591
Kassulke, Ondricka and Metz $1,351 $86,451
Keeling LLC $1,364 $100,934
Kiehn-Spinka $1,261 $99,609
Koepp Ltd $1,264 $103,661
Kuhn-Gusikowski $1,248 $91,094
Kulas Inc $1,461 $137,352
Pollich LLC $1,197 $87,347
Purdy-Kunde $1,470 $77,898
Sanford and Sons $1,392 $98,823
Stokes LLC $1,271 $91,536
Trantow-Barrows $1,313 $123,381
White-Trantow $1,580 $135,842
Will LLC $1,411 $104,438

More complex analysis of performance by month

In [7]:
monthly_sales = df.groupby([pd.Grouper(key='date', freq='M')])['ext price'].agg(['sum']).reset_index()
monthly_sales['pct_of_total'] = monthly_sales['sum'] / df['ext price'].sum()
monthly_sales
Out[7]:
date sum pct_of_total
0 2018-01-31 185361.66 0.091818
1 2018-02-28 146211.62 0.072426
2 2018-03-31 203921.38 0.101012
3 2018-04-30 174574.11 0.086475
4 2018-05-31 165418.55 0.081940
5 2018-06-30 174089.33 0.086235
6 2018-07-31 191662.11 0.094939
7 2018-08-31 153778.59 0.076174
8 2018-09-30 168443.17 0.083438
9 2018-10-31 171495.32 0.084950
10 2018-11-30 119961.22 0.059423
11 2018-12-31 163867.26 0.081171

Use a format dictionary to control formatting per column

In [8]:
format_dict = {'sum':'${0:,.0f}', 'date': '{:%m-%Y}', 'pct_of_total': '{:.2%}'}
monthly_sales.style.format(format_dict).hide_index()
Out[8]:
date sum pct_of_total
01-2018 $185,362 9.18%
02-2018 $146,212 7.24%
03-2018 $203,921 10.10%
04-2018 $174,574 8.65%
05-2018 $165,419 8.19%
06-2018 $174,089 8.62%
07-2018 $191,662 9.49%
08-2018 $153,779 7.62%
09-2018 $168,443 8.34%
10-2018 $171,495 8.49%
11-2018 $119,961 5.94%
12-2018 $163,867 8.12%

Introduce the highlight functions

In [9]:
(monthly_sales
 .style
 .format(format_dict)
 .hide_index()
 .highlight_max(color='lightgreen')
 .highlight_min(color='#cd4f39'))
Out[9]:
date sum pct_of_total
01-2018 $185,362 9.18%
02-2018 $146,212 7.24%
03-2018 $203,921 10.10%
04-2018 $174,574 8.65%
05-2018 $165,419 8.19%
06-2018 $174,089 8.62%
07-2018 $191,662 9.49%
08-2018 $153,779 7.62%
09-2018 $168,443 8.34%
10-2018 $171,495 8.49%
11-2018 $119,961 5.94%
12-2018 $163,867 8.12%

Introduce bar formatting for table cells

In [10]:
(monthly_sales
 .style
 .format(format_dict)
 .hide_index()
.bar(color='#FFA07A', vmin=100_000, subset=['sum'], align='zero')
.bar(color='lightgreen', vmin=0, subset=['pct_of_total'], align='zero')
.set_caption('2018 Sales Performance'))
Out[10]:
2018 Sales Performance
date sum pct_of_total
01-2018 $185,362 9.18%
02-2018 $146,212 7.24%
03-2018 $203,921 10.10%
04-2018 $174,574 8.65%
05-2018 $165,419 8.19%
06-2018 $174,089 8.62%
07-2018 $191,662 9.49%
08-2018 $153,779 7.62%
09-2018 $168,443 8.34%
10-2018 $171,495 8.49%
11-2018 $119,961 5.94%
12-2018 $163,867 8.12%
In [11]:
(monthly_sales.style
 .format(format_dict)
 .background_gradient(subset=['sum'],cmap='BuGn'))
Out[11]:
date sum pct_of_total
0 01-2018 $185,362 9.18%
1 02-2018 $146,212 7.24%
2 03-2018 $203,921 10.10%
3 04-2018 $174,574 8.65%
4 05-2018 $165,419 8.19%
5 06-2018 $174,089 8.62%
6 07-2018 $191,662 9.49%
7 08-2018 $153,779 7.62%
8 09-2018 $168,443 8.34%
9 10-2018 $171,495 8.49%
10 11-2018 $119,961 5.94%
11 12-2018 $163,867 8.12%

Cool example of using sparklines from Peter Baumgartner https://twitter.com/pmbaumgartner/status/1084645440224559104

In [12]:
def sparkline_str(x):
    bins=np.histogram(x)[0]
    sl = ''.join(sparklines(bins))
    return sl
sparkline_str.__name__ = "sparkline"
In [13]:
df.groupby('name')['quantity', 'ext price'].agg(['mean', sparkline_str])
Out[13]:
quantity ext price
mean sparkline mean sparkline
name
Barton LLC 24.890244 ▄▄▃▂▃▆▄█▁▄ 1334.615854 █▄▃▆▄▄▁▁▁▁
Cronin, Oberbrunner and Spencer 24.970149 █▄▁▄▄▇▅▁▄▄ 1339.321642 █▅▅▃▃▃▂▂▁▁
Frami, Hills and Schmidt 26.430556 ▄▄▁▂▇█▂▂▅▅ 1438.466528 █▅▄▇▅▃▄▁▁▁
Fritsch, Russel and Anderson 26.074074 ▁▄▇▃▂▂█▃▄▄ 1385.366790 ▇█▃▄▂▂▁▂▁▁
Halvorson, Crona and Champlin 22.137931 ▇▆▆▇█▁▄▂▄▃ 1206.971724 ██▆▅▁▃▂▂▂▂
Herman LLC 24.806452 ▄▃▅▁▆▄▂▆▃█ 1336.532258 █▅▇▄▅▄▁▃▂▂
Jerde-Hilpert 22.460674 ▄▄█▁▂▅▃▂▄▃ 1265.072247 █▄▅▂▁▂▃▂▂▁
Kassulke, Ondricka and Metz 25.734375 ▂▂▁▁▂▂▁▅▄█ 1350.797969 █▆▆▄▄▃▂▁▁▂
Keeling LLC 24.405405 ▁▄▇▃▅█▃▄▃▆ 1363.977027 ▅█▆▃▄▂▂▁▁▁
Kiehn-Spinka 22.227848 ▃▂█▂▃▅▄▁▄▁ 1260.870506 █▇▄▃▃▂▁▂▁▁
Koepp Ltd 21.829268 ▅▇█▆▄▇▅▁▅▇ 1264.152927 █▇▅▂▄▂▂▂▁▁
Kuhn-Gusikowski 22.808219 ▂▄█▄▃▁█▄▂▄ 1247.866849 ▆█▄▃▃▃▃▁▁▁
Kulas Inc 24.095745 ▇▃▇▂▇▁▅▆█▆ 1461.191064 █▅█▂▄▄▄▂▁▁
Pollich LLC 23.383562 █▂▅▂▃▆▁▂▄▇ 1196.536712 █▆▆▃▃▃▂▂▁▁
Purdy-Kunde 27.358491 ▄▅█▁▄▇▅█▇█ 1469.777547 █▅▄▃▄▃▃▂▁▂
Sanford and Sons 24.000000 ▂▂▂▆▁█▂▂▁▅ 1391.872958 ▅▇█▃▃▄▁▃▁▂
Stokes LLC 24.527778 ▄▄▄▁▃▂▄█▄▄ 1271.332222 █▇▆▆▄▂▂▃▁▂
Trantow-Barrows 24.159574 ▅▄▅▂▃▃▄▄▁█ 1312.567872 █▇▅▆▃▂▂▁▁▁
White-Trantow 26.255814 ▃▂▆▆▃██▁██ 1579.558023 ▄█▆▅▄▃▄▂▁▃
Will LLC 24.702703 ▂█▁▄▄▄▂▅▅▄ 1411.318919 ▆█▄▅▄▂▁▂▁▂
In [ ]: