Notebook to accompany article on PB Python. http://pbpython.com/altair-intro.html
import pandas as pd
from altair import Chart, X, Y, Axis, SortField
%matplotlib inline
Read in the sample CSV file containing the MN 2014 capital budget
budget = pd.read_csv("https://github.com/chris1610/pbpython/raw/master/data/mn-budget-detail-2014.csv")
budget.head()
category | detail | amount | |
---|---|---|---|
0 | UNIVERSITY OF MINNESOTA | Higher Education Asset Preservation (HEAPR) 1 | 42500000 |
1 | UNIVERSITY OF MINNESOTA | Minneapolis; Tate Laboratory Renovation | 56700000 |
2 | UNIVERSITY OF MINNESOTA | Crookston; Wellness Center | 10000000 |
3 | UNIVERSITY OF MINNESOTA | Research Laboratories | 8667000 |
4 | UNIVERSITY OF MINNESOTA | Duluth; Chemical Sciences and Advanced Materia... | 1500000 |
Make a basic plot of the top 10 expenditures using the default pandas plotting functions
budget_top_10 = budget.sort_values(by='amount',ascending=False)[:10]
budget_top_10.plot(kind="bar",x=budget_top_10["detail"],
title="MN Capital Budget - 2014",
legend=False)
<matplotlib.axes._subplots.AxesSubplot at 0x7fc3e0cffda0>
Build a similar plot in Altair
c = Chart(budget_top_10).mark_bar().encode(
x='detail',
y='amount')
c
Convert to a horizontal bar graph
c = Chart(budget_top_10).mark_bar().encode(
y='detail',
x='amount')
c
Look at the dictionary containing the JSON spec
c.to_dict(data=False)
{'encoding': {'x': {'field': 'amount', 'type': 'quantitative'}, 'y': {'field': 'detail', 'type': 'nominal'}}, 'mark': 'bar'}
Use X() and Y() - does nothing in this example but is useful for future steps
Chart(budget_top_10).mark_bar().encode(
x=X('detail'),
y=Y('amount')
)
Add in color codes based on the category. This automatically includes a legend.
Chart(budget_top_10).mark_bar().encode(
x=X('detail'),
y=Y('amount'),
color="category"
)
Show what happens if we run on all data, not just the top 10
Chart(budget).mark_bar().encode(
x='detail',
y='amount',
color='category')
Filter data to only amounts >= $10M
Chart(budget).mark_bar().encode(
x='detail:N',
y='amount:Q',
color='category').transform_data(
filter='datum.amount >= 10000000',
)
Swap X and Y in order to see it as a horizontal bar chart
Chart(budget).mark_bar().encode(
y='detail:N',
x='amount:Q',
color='category').transform_data(
filter='datum.amount >= 10000000',
)
Equivalent approach using X and Y classes
Chart(budget).mark_bar().encode(
x=X('detail:O',
axis=Axis(title='Project')),
y=Y('amount:Q',
axis=Axis(title='2014 Budget')),
color='category').transform_data(
filter='datum.amount >= 10000000',
)
Order the projects based on their total spend
Chart(budget).mark_bar().encode(
x=X('detail:O', sort=SortField(field='amount', order='descending', op='sum'),
axis=Axis(title='Project')),
y=Y('amount:Q',
axis=Axis(title='2014 Budget')),
color='category').transform_data(
filter='datum.amount >= 10000000',
)
Summarize the data at the category level
Chart(budget).mark_bar().encode(
x=X('category', sort=SortField(field='amount', order='descending', op='sum'),
axis=Axis(title='Category')),
y=Y('sum(amount)',
axis=Axis(title='2014 Budget')))
c = Chart(budget).mark_bar().encode(
y=Y('category', sort=SortField(field='amount', order='descending', op='sum'),
axis=Axis(title='Category')),
x=X('sum(amount)',
axis=Axis(title='2014 Budget')))
c
c.to_dict(data=False)
{'encoding': {'x': {'aggregate': 'sum', 'axis': {'title': '2014 Budget'}, 'field': 'amount', 'type': 'quantitative'}, 'y': {'axis': {'title': 'Category'}, 'field': 'category', 'sort': {'field': 'amount', 'op': 'sum', 'order': 'descending'}, 'type': 'nominal'}}, 'mark': 'bar'}