Content to accompany blog post on Practical Business Python
import numpy as np
import pandas as pd
from sparklines import sparklines
df = pd.read_excel('https://github.com/chris1610/pbpython/blob/master/data/2018_Sales_Total.xlsx?raw=true')
df.head()
account number | name | sku | quantity | unit price | ext price | date | |
---|---|---|---|---|---|---|---|
0 | 740150 | Barton LLC | B1-20000 | 39 | 86.69 | 3380.91 | 2018-01-01 07:21:51 |
1 | 714466 | Trantow-Barrows | S2-77896 | -1 | 63.16 | -63.16 | 2018-01-01 10:00:47 |
2 | 218895 | Kulas Inc | B1-69924 | 23 | 90.70 | 2086.10 | 2018-01-01 13:24:58 |
3 | 307599 | Kassulke, Ondricka and Metz | S1-65481 | 41 | 21.05 | 863.05 | 2018-01-01 15:05:22 |
4 | 412290 | Jerde-Hilpert | S2-34077 | 6 | 83.21 | 499.26 | 2018-01-01 23:26:55 |
Do a simple groupby to look at the performance by customr
df.groupby('name')['ext price'].agg(['mean', 'sum'])
mean | sum | |
---|---|---|
name | ||
Barton LLC | 1334.615854 | 109438.50 |
Cronin, Oberbrunner and Spencer | 1339.321642 | 89734.55 |
Frami, Hills and Schmidt | 1438.466528 | 103569.59 |
Fritsch, Russel and Anderson | 1385.366790 | 112214.71 |
Halvorson, Crona and Champlin | 1206.971724 | 70004.36 |
Herman LLC | 1336.532258 | 82865.00 |
Jerde-Hilpert | 1265.072247 | 112591.43 |
Kassulke, Ondricka and Metz | 1350.797969 | 86451.07 |
Keeling LLC | 1363.977027 | 100934.30 |
Kiehn-Spinka | 1260.870506 | 99608.77 |
Koepp Ltd | 1264.152927 | 103660.54 |
Kuhn-Gusikowski | 1247.866849 | 91094.28 |
Kulas Inc | 1461.191064 | 137351.96 |
Pollich LLC | 1196.536712 | 87347.18 |
Purdy-Kunde | 1469.777547 | 77898.21 |
Sanford and Sons | 1391.872958 | 98822.98 |
Stokes LLC | 1271.332222 | 91535.92 |
Trantow-Barrows | 1312.567872 | 123381.38 |
White-Trantow | 1579.558023 | 135841.99 |
Will LLC | 1411.318919 | 104437.60 |
Style the currency using python's string formatting
(df.groupby('name')['ext price']
.agg(['mean', 'sum'])
.style.format('${0:,.2f}'))
mean | sum | |
---|---|---|
name | ||
Barton LLC | $1,334.62 | $109,438.50 |
Cronin, Oberbrunner and Spencer | $1,339.32 | $89,734.55 |
Frami, Hills and Schmidt | $1,438.47 | $103,569.59 |
Fritsch, Russel and Anderson | $1,385.37 | $112,214.71 |
Halvorson, Crona and Champlin | $1,206.97 | $70,004.36 |
Herman LLC | $1,336.53 | $82,865.00 |
Jerde-Hilpert | $1,265.07 | $112,591.43 |
Kassulke, Ondricka and Metz | $1,350.80 | $86,451.07 |
Keeling LLC | $1,363.98 | $100,934.30 |
Kiehn-Spinka | $1,260.87 | $99,608.77 |
Koepp Ltd | $1,264.15 | $103,660.54 |
Kuhn-Gusikowski | $1,247.87 | $91,094.28 |
Kulas Inc | $1,461.19 | $137,351.96 |
Pollich LLC | $1,196.54 | $87,347.18 |
Purdy-Kunde | $1,469.78 | $77,898.21 |
Sanford and Sons | $1,391.87 | $98,822.98 |
Stokes LLC | $1,271.33 | $91,535.92 |
Trantow-Barrows | $1,312.57 | $123,381.38 |
White-Trantow | $1,579.56 | $135,841.99 |
Will LLC | $1,411.32 | $104,437.60 |
Round the results to 0 decimals
(df.groupby('name')['ext price']
.agg(['mean', 'sum'])
.style.format('${0:,.0f}'))
mean | sum | |
---|---|---|
name | ||
Barton LLC | $1,335 | $109,438 |
Cronin, Oberbrunner and Spencer | $1,339 | $89,735 |
Frami, Hills and Schmidt | $1,438 | $103,570 |
Fritsch, Russel and Anderson | $1,385 | $112,215 |
Halvorson, Crona and Champlin | $1,207 | $70,004 |
Herman LLC | $1,337 | $82,865 |
Jerde-Hilpert | $1,265 | $112,591 |
Kassulke, Ondricka and Metz | $1,351 | $86,451 |
Keeling LLC | $1,364 | $100,934 |
Kiehn-Spinka | $1,261 | $99,609 |
Koepp Ltd | $1,264 | $103,661 |
Kuhn-Gusikowski | $1,248 | $91,094 |
Kulas Inc | $1,461 | $137,352 |
Pollich LLC | $1,197 | $87,347 |
Purdy-Kunde | $1,470 | $77,898 |
Sanford and Sons | $1,392 | $98,823 |
Stokes LLC | $1,271 | $91,536 |
Trantow-Barrows | $1,313 | $123,381 |
White-Trantow | $1,580 | $135,842 |
Will LLC | $1,411 | $104,438 |
More complex analysis of performance by month
monthly_sales = df.groupby([pd.Grouper(key='date', freq='M')])['ext price'].agg(['sum']).reset_index()
monthly_sales['pct_of_total'] = monthly_sales['sum'] / df['ext price'].sum()
monthly_sales
date | sum | pct_of_total | |
---|---|---|---|
0 | 2018-01-31 | 185361.66 | 0.091818 |
1 | 2018-02-28 | 146211.62 | 0.072426 |
2 | 2018-03-31 | 203921.38 | 0.101012 |
3 | 2018-04-30 | 174574.11 | 0.086475 |
4 | 2018-05-31 | 165418.55 | 0.081940 |
5 | 2018-06-30 | 174089.33 | 0.086235 |
6 | 2018-07-31 | 191662.11 | 0.094939 |
7 | 2018-08-31 | 153778.59 | 0.076174 |
8 | 2018-09-30 | 168443.17 | 0.083438 |
9 | 2018-10-31 | 171495.32 | 0.084950 |
10 | 2018-11-30 | 119961.22 | 0.059423 |
11 | 2018-12-31 | 163867.26 | 0.081171 |
Use a format dictionary to control formatting per column
format_dict = {'sum':'${0:,.0f}', 'date': '{:%m-%Y}', 'pct_of_total': '{:.2%}'}
monthly_sales.style.format(format_dict).hide_index()
date | sum | pct_of_total |
---|---|---|
01-2018 | $185,362 | 9.18% |
02-2018 | $146,212 | 7.24% |
03-2018 | $203,921 | 10.10% |
04-2018 | $174,574 | 8.65% |
05-2018 | $165,419 | 8.19% |
06-2018 | $174,089 | 8.62% |
07-2018 | $191,662 | 9.49% |
08-2018 | $153,779 | 7.62% |
09-2018 | $168,443 | 8.34% |
10-2018 | $171,495 | 8.49% |
11-2018 | $119,961 | 5.94% |
12-2018 | $163,867 | 8.12% |
Introduce the highlight functions
(monthly_sales
.style
.format(format_dict)
.hide_index()
.highlight_max(color='lightgreen')
.highlight_min(color='#cd4f39'))
date | sum | pct_of_total |
---|---|---|
01-2018 | $185,362 | 9.18% |
02-2018 | $146,212 | 7.24% |
03-2018 | $203,921 | 10.10% |
04-2018 | $174,574 | 8.65% |
05-2018 | $165,419 | 8.19% |
06-2018 | $174,089 | 8.62% |
07-2018 | $191,662 | 9.49% |
08-2018 | $153,779 | 7.62% |
09-2018 | $168,443 | 8.34% |
10-2018 | $171,495 | 8.49% |
11-2018 | $119,961 | 5.94% |
12-2018 | $163,867 | 8.12% |
Introduce bar formatting for table cells
(monthly_sales
.style
.format(format_dict)
.hide_index()
.bar(color='#FFA07A', vmin=100_000, subset=['sum'], align='zero')
.bar(color='lightgreen', vmin=0, subset=['pct_of_total'], align='zero')
.set_caption('2018 Sales Performance'))
date | sum | pct_of_total |
---|---|---|
01-2018 | $185,362 | 9.18% |
02-2018 | $146,212 | 7.24% |
03-2018 | $203,921 | 10.10% |
04-2018 | $174,574 | 8.65% |
05-2018 | $165,419 | 8.19% |
06-2018 | $174,089 | 8.62% |
07-2018 | $191,662 | 9.49% |
08-2018 | $153,779 | 7.62% |
09-2018 | $168,443 | 8.34% |
10-2018 | $171,495 | 8.49% |
11-2018 | $119,961 | 5.94% |
12-2018 | $163,867 | 8.12% |
(monthly_sales.style
.format(format_dict)
.background_gradient(subset=['sum'],cmap='BuGn'))
date | sum | pct_of_total | |
---|---|---|---|
0 | 01-2018 | $185,362 | 9.18% |
1 | 02-2018 | $146,212 | 7.24% |
2 | 03-2018 | $203,921 | 10.10% |
3 | 04-2018 | $174,574 | 8.65% |
4 | 05-2018 | $165,419 | 8.19% |
5 | 06-2018 | $174,089 | 8.62% |
6 | 07-2018 | $191,662 | 9.49% |
7 | 08-2018 | $153,779 | 7.62% |
8 | 09-2018 | $168,443 | 8.34% |
9 | 10-2018 | $171,495 | 8.49% |
10 | 11-2018 | $119,961 | 5.94% |
11 | 12-2018 | $163,867 | 8.12% |
Cool example of using sparklines from Peter Baumgartner https://twitter.com/pmbaumgartner/status/1084645440224559104
def sparkline_str(x):
bins=np.histogram(x)[0]
sl = ''.join(sparklines(bins))
return sl
sparkline_str.__name__ = "sparkline"
df.groupby('name')['quantity', 'ext price'].agg(['mean', sparkline_str])
quantity | ext price | |||
---|---|---|---|---|
mean | sparkline | mean | sparkline | |
name | ||||
Barton LLC | 24.890244 | ▄▄▃▂▃▆▄█▁▄ | 1334.615854 | █▄▃▆▄▄▁▁▁▁ |
Cronin, Oberbrunner and Spencer | 24.970149 | █▄▁▄▄▇▅▁▄▄ | 1339.321642 | █▅▅▃▃▃▂▂▁▁ |
Frami, Hills and Schmidt | 26.430556 | ▄▄▁▂▇█▂▂▅▅ | 1438.466528 | █▅▄▇▅▃▄▁▁▁ |
Fritsch, Russel and Anderson | 26.074074 | ▁▄▇▃▂▂█▃▄▄ | 1385.366790 | ▇█▃▄▂▂▁▂▁▁ |
Halvorson, Crona and Champlin | 22.137931 | ▇▆▆▇█▁▄▂▄▃ | 1206.971724 | ██▆▅▁▃▂▂▂▂ |
Herman LLC | 24.806452 | ▄▃▅▁▆▄▂▆▃█ | 1336.532258 | █▅▇▄▅▄▁▃▂▂ |
Jerde-Hilpert | 22.460674 | ▄▄█▁▂▅▃▂▄▃ | 1265.072247 | █▄▅▂▁▂▃▂▂▁ |
Kassulke, Ondricka and Metz | 25.734375 | ▂▂▁▁▂▂▁▅▄█ | 1350.797969 | █▆▆▄▄▃▂▁▁▂ |
Keeling LLC | 24.405405 | ▁▄▇▃▅█▃▄▃▆ | 1363.977027 | ▅█▆▃▄▂▂▁▁▁ |
Kiehn-Spinka | 22.227848 | ▃▂█▂▃▅▄▁▄▁ | 1260.870506 | █▇▄▃▃▂▁▂▁▁ |
Koepp Ltd | 21.829268 | ▅▇█▆▄▇▅▁▅▇ | 1264.152927 | █▇▅▂▄▂▂▂▁▁ |
Kuhn-Gusikowski | 22.808219 | ▂▄█▄▃▁█▄▂▄ | 1247.866849 | ▆█▄▃▃▃▃▁▁▁ |
Kulas Inc | 24.095745 | ▇▃▇▂▇▁▅▆█▆ | 1461.191064 | █▅█▂▄▄▄▂▁▁ |
Pollich LLC | 23.383562 | █▂▅▂▃▆▁▂▄▇ | 1196.536712 | █▆▆▃▃▃▂▂▁▁ |
Purdy-Kunde | 27.358491 | ▄▅█▁▄▇▅█▇█ | 1469.777547 | █▅▄▃▄▃▃▂▁▂ |
Sanford and Sons | 24.000000 | ▂▂▂▆▁█▂▂▁▅ | 1391.872958 | ▅▇█▃▃▄▁▃▁▂ |
Stokes LLC | 24.527778 | ▄▄▄▁▃▂▄█▄▄ | 1271.332222 | █▇▆▆▄▂▂▃▁▂ |
Trantow-Barrows | 24.159574 | ▅▄▅▂▃▃▄▄▁█ | 1312.567872 | █▇▅▆▃▂▂▁▁▁ |
White-Trantow | 26.255814 | ▃▂▆▆▃██▁██ | 1579.558023 | ▄█▆▅▄▃▄▂▁▃ |
Will LLC | 24.702703 | ▂█▁▄▄▄▂▅▅▄ | 1411.318919 | ▆█▄▅▄▂▁▂▁▂ |