Let's load in a standard csv file and print it out using the defaults. Then we will see how we can use Panda's styling to make it easier to extract information. The dataset we are using are the campaign contributions for the 2016 presidential elections. The candidates have been sorted by the total amount of money the raised.
import pandas as pd
desired_cols = ['Surname', 'Given Name', 'Party', 'Individual contributions', 'Committee contributions',
'Candidate contributions', 'Transfers', 'Offsets', 'Other receipts', 'Total', 'Percent individual']
SRC_URL = 'https://raw.githubusercontent.com/kiwidamien/StackedTurtles/master/content/style_jupyter/2016_contributions.csv'
contributions = pd.read_csv(SRC_URL)[desired_cols]
contributions
Surname | Given Name | Party | Individual contributions | Committee contributions | Candidate contributions | Transfers | Offsets | Other receipts | Total | Percent individual | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Clinton | Hillary | Democrat | 2.311585e+08 | 1339118.32 | 997159.15 | 33940000.00 | 7073044.01 | 32863.96 | 2.745417e+08 | 0.841980 |
1 | Sanders | Bernard | Democrat | 2.306704e+08 | 5621.92 | 0.00 | 1500000.00 | 3202486.10 | 44028.81 | 2.354225e+08 | 0.979814 |
2 | Cruz | Rafael | Republican | 9.211106e+07 | 101095.14 | 0.00 | 250012.93 | 154093.55 | 8086.38 | 9.262435e+07 | 0.994458 |
3 | Trump | Donald | Republican | 3.695986e+07 | 0.00 | 49950643.36 | 2201313.93 | 2186237.76 | 0.00 | 9.129811e+07 | 0.404826 |
4 | Carson | Benjamin | Republican | 6.346140e+07 | 5588.29 | 25000.00 | 0.00 | 102499.71 | 649470.63 | 6.424396e+07 | 0.987819 |
5 | Rubio | Marco | Republican | 4.536183e+07 | 455970.75 | 0.00 | 662431.58 | 456260.05 | 595102.60 | 4.753170e+07 | 0.954349 |
6 | Bush | Jeb | Republican | 3.358905e+07 | 230317.05 | 795703.65 | 0.00 | 750658.19 | 50000.00 | 3.541573e+07 | 0.948422 |
7 | Kasich | John | Republican | 1.905342e+07 | 275870.62 | 0.00 | 0.00 | 107196.83 | 9966.00 | 1.944645e+07 | 0.979789 |
8 | Paul | Rand | Republican | 1.022932e+07 | 47171.17 | 0.00 | 1735263.26 | 31997.57 | 212501.08 | 1.225626e+07 | 0.834621 |
9 | Fiorina | Carly | Republican | 1.204584e+07 | 20925.00 | 0.00 | 0.00 | 13298.08 | 39506.11 | 1.211957e+07 | 0.993917 |
10 | Christie | Christopher | Republican | 8.432629e+06 | 115738.02 | 0.00 | 0.00 | 193218.94 | 0.00 | 8.741586e+06 | 0.964657 |
11 | Walker | Scott | Republican | 8.210190e+06 | 53100.00 | 0.00 | 0.00 | 64726.19 | 350529.12 | 8.678545e+06 | 0.946033 |
12 | O'Malley | Martin | Democrat | 4.573815e+06 | 115002.01 | 0.00 | 0.00 | 60471.44 | 0.00 | 6.338218e+06 | 0.721625 |
13 | Graham | Lindsey | Republican | 3.741437e+06 | 71750.00 | 0.00 | 1975000.00 | 35792.38 | 8730.32 | 5.832710e+06 | 0.641458 |
14 | Huckabee | Mike | Republican | 4.273304e+06 | 33000.00 | 12000.00 | 0.00 | 8499.61 | 0.00 | 4.326804e+06 | 0.987635 |
15 | Santorum | Richard | Republican | 1.364537e+06 | 7350.00 | 24000.00 | 0.00 | 12978.55 | 75237.29 | 1.484103e+06 | 0.919436 |
16 | Jindal | Bobby | Republican | 1.432464e+06 | 10000.00 | 0.00 | 0.00 | 0.00 | 0.00 | 1.442464e+06 | 0.993067 |
17 | Johnson | Gary | Libertarian | 1.358671e+06 | 5000.00 | 20.00 | 0.00 | 26.40 | 0.00 | 1.363957e+06 | 0.996125 |
18 | Stein | Jill | Green | 6.246960e+05 | 0.00 | 40000.00 | 0.00 | 491.18 | 0.00 | 8.621115e+05 | 0.724612 |
19 | Webb | James | Democrat | 7.715784e+05 | 5000.00 | 0.00 | 0.00 | 250.00 | 0.00 | 7.768284e+05 | 0.993242 |
20 | Pataki | George | Republican | 5.103273e+05 | 1000.00 | 20000.00 | 0.00 | 15935.20 | 0.00 | 5.472626e+05 | 0.932509 |
21 | Gilmore | James | Republican | 1.043959e+05 | 2000.00 | 279075.00 | 0.00 | 1790.49 | 0.00 | 3.872614e+05 | 0.269575 |
22 | Lessig | Lawrence | Democrat | 1.818120e+05 | 0.00 | 32000.00 | 0.00 | 6443.00 | 0.00 | 2.202550e+05 | 0.825461 |
Throughout this, we will look at the changes that styling makes just to the first few rows of our dataframe.
contributions.head()
Surname | Given Name | Party | Individual contributions | Committee contributions | Candidate contributions | Transfers | Offsets | Other receipts | Total | Percent individual | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Clinton | Hillary | Democrat | 2.311585e+08 | 1339118.32 | 997159.15 | 33940000.00 | 7073044.01 | 32863.96 | 2.745417e+08 | 0.841980 |
1 | Sanders | Bernard | Democrat | 2.306704e+08 | 5621.92 | 0.00 | 1500000.00 | 3202486.10 | 44028.81 | 2.354225e+08 | 0.979814 |
2 | Cruz | Rafael | Republican | 9.211106e+07 | 101095.14 | 0.00 | 250012.93 | 154093.55 | 8086.38 | 9.262435e+07 | 0.994458 |
3 | Trump | Donald | Republican | 3.695986e+07 | 0.00 | 49950643.36 | 2201313.93 | 2186237.76 | 0.00 | 9.129811e+07 | 0.404826 |
4 | Carson | Benjamin | Republican | 6.346140e+07 | 5588.29 | 25000.00 | 0.00 | 102499.71 | 649470.63 | 6.424396e+07 | 0.987819 |
We see that we have a mix of scientific notation (2.311e+08
), as well as the large numbers in the Transfers column that are very difficult to read. We can ask for all floats from a dataframe to be formatted as
,
separators between the hundreds, thousands, millions, etcLet's see this in action:
pd.options.display.float_format = '{:,.2f}'.format
contributions.head()
Surname | Given Name | Party | Individual contributions | Committee contributions | Candidate contributions | Transfers | Offsets | Other receipts | Total | Percent individual | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Clinton | Hillary | Democrat | 231,158,512.33 | 1,339,118.32 | 997,159.15 | 33,940,000.00 | 7,073,044.01 | 32,863.96 | 274,541,697.77 | 0.84 |
1 | Sanders | Bernard | Democrat | 230,670,405.61 | 5,621.92 | 0.00 | 1,500,000.00 | 3,202,486.10 | 44,028.81 | 235,422,542.44 | 0.98 |
2 | Cruz | Rafael | Republican | 92,111,063.05 | 101,095.14 | 0.00 | 250,012.93 | 154,093.55 | 8,086.38 | 92,624,351.05 | 0.99 |
3 | Trump | Donald | Republican | 36,959,857.71 | 0.00 | 49,950,643.36 | 2,201,313.93 | 2,186,237.76 | 0.00 | 91,298,110.38 | 0.40 |
4 | Carson | Benjamin | Republican | 63,461,402.63 | 5,588.29 | 25,000.00 | 0.00 | 102,499.71 | 649,470.63 | 64,243,961.26 | 0.99 |
This is now the default for all dataframes in this workbook, not just this one. This is because we have told Pandas to adjust its settings globally.
We can override the global again, to put a $
sign in front of all the floats, as follows:
# By default, everything here is in dollars
pd.options.display.float_format = '${:,.2f}'.format
contributions.head()
Surname | Given Name | Party | Individual contributions | Committee contributions | Candidate contributions | Transfers | Offsets | Other receipts | Total | Percent individual | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Clinton | Hillary | Democrat | $231,158,512.33 | $1,339,118.32 | $997,159.15 | $33,940,000.00 | $7,073,044.01 | $32,863.96 | $274,541,697.77 | $0.84 |
1 | Sanders | Bernard | Democrat | $230,670,405.61 | $5,621.92 | $0.00 | $1,500,000.00 | $3,202,486.10 | $44,028.81 | $235,422,542.44 | $0.98 |
2 | Cruz | Rafael | Republican | $92,111,063.05 | $101,095.14 | $0.00 | $250,012.93 | $154,093.55 | $8,086.38 | $92,624,351.05 | $0.99 |
3 | Trump | Donald | Republican | $36,959,857.71 | $0.00 | $49,950,643.36 | $2,201,313.93 | $2,186,237.76 | $0.00 | $91,298,110.38 | $0.40 |
4 | Carson | Benjamin | Republican | $63,461,402.63 | $5,588.29 | $25,000.00 | $0.00 | $102,499.71 | $649,470.63 | $64,243,961.26 | $0.99 |
Note that the Percent individual also got a dollar sign. This isn't what we wanted! We can set the format column by column if we want, using the .style
method:
format_dict = {
'Percent individual': '{:,.2%}'
}
# Note that format dict overwrites the default format
# Also note that after `.style` we no longer have a dataframe, so
# contributions.style.format(...).head()
# will result in an error!
contributions.head().style.format(format_dict)
Surname | Given Name | Party | Individual contributions | Committee contributions | Candidate contributions | Transfers | Offsets | Other receipts | Total | Percent individual | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Clinton | Hillary | Democrat | 2.31159e+08 | 1.33912e+06 | 997159 | 3.394e+07 | 7.07304e+06 | 32864 | 2.74542e+08 | 84.20% |
1 | Sanders | Bernard | Democrat | 2.3067e+08 | 5621.92 | 0 | 1.5e+06 | 3.20249e+06 | 44028.8 | 2.35423e+08 | 97.98% |
2 | Cruz | Rafael | Republican | 9.21111e+07 | 101095 | 0 | 250013 | 154094 | 8086.38 | 9.26244e+07 | 99.45% |
3 | Trump | Donald | Republican | 3.69599e+07 | 0 | 4.99506e+07 | 2.20131e+06 | 2.18624e+06 | 0 | 9.12981e+07 | 40.48% |
4 | Carson | Benjamin | Republican | 6.34614e+07 | 5588.29 | 25000 | 0 | 102500 | 649471 | 6.4244e+07 | 98.78% |
Note that we have lost the dollar signs, and we have scientific notation back. That is because doing .style
no longer returns a dataframe (instead it returns a Styler
object), and the Pandas defaults only apply to dataframes. In particular, if we try running a dataframe method, such as .head()
on the styler, it won't work:
try:
contributions.style.format(format_dict).head()
except AttributeError as e:
print(f"Error: {e}")
Error: 'Styler' object has no attribute 'head'
We can check that the Pandas defaults persist by looking at a dataframe again (instead of a Styler
object):
# If we don't pass in a format, everything goes to the default
contributions.head()
Surname | Given Name | Party | Individual contributions | Committee contributions | Candidate contributions | Transfers | Offsets | Other receipts | Total | Percent individual | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Clinton | Hillary | Democrat | $231,158,512.33 | $1,339,118.32 | $997,159.15 | $33,940,000.00 | $7,073,044.01 | $32,863.96 | $274,541,697.77 | $0.84 |
1 | Sanders | Bernard | Democrat | $230,670,405.61 | $5,621.92 | $0.00 | $1,500,000.00 | $3,202,486.10 | $44,028.81 | $235,422,542.44 | $0.98 |
2 | Cruz | Rafael | Republican | $92,111,063.05 | $101,095.14 | $0.00 | $250,012.93 | $154,093.55 | $8,086.38 | $92,624,351.05 | $0.99 |
3 | Trump | Donald | Republican | $36,959,857.71 | $0.00 | $49,950,643.36 | $2,201,313.93 | $2,186,237.76 | $0.00 | $91,298,110.38 | $0.40 |
4 | Carson | Benjamin | Republican | $63,461,402.63 | $5,588.29 | $25,000.00 | $0.00 | $102,499.71 | $649,470.63 | $64,243,961.26 | $0.99 |
We see that the dollar signs are back (as well as, sadly, on the percentage column).
If we use a format dictionary, we have to set the format on each column we want, or accept the defaults. Luckily, we have select_dtypes
, which selects all columns of the type we are interested in. We can generate the dictionary with the default we want, and then override specific values.
Here is this trick in action:
# Here is a trick to get the default
format_dict = {col_name: '${:,.2f}' for col_name in contributions.select_dtypes(float).columns}
# .. and now overwrite the percentage column
format_dict['Percent individual'] = '{:,.1%}'
# Note the ".hide_index()" method of Styler allows us to hide
# the index (which is meaningless in this case)
contributions.head().style.format(format_dict).hide_index()
Surname | Given Name | Party | Individual contributions | Committee contributions | Candidate contributions | Transfers | Offsets | Other receipts | Total | Percent individual |
---|---|---|---|---|---|---|---|---|---|---|
Clinton | Hillary | Democrat | $231,158,512.33 | $1,339,118.32 | $997,159.15 | $33,940,000.00 | $7,073,044.01 | $32,863.96 | $274,541,697.77 | 84.2% |
Sanders | Bernard | Democrat | $230,670,405.61 | $5,621.92 | $0.00 | $1,500,000.00 | $3,202,486.10 | $44,028.81 | $235,422,542.44 | 98.0% |
Cruz | Rafael | Republican | $92,111,063.05 | $101,095.14 | $0.00 | $250,012.93 | $154,093.55 | $8,086.38 | $92,624,351.05 | 99.4% |
Trump | Donald | Republican | $36,959,857.71 | $0.00 | $49,950,643.36 | $2,201,313.93 | $2,186,237.76 | $0.00 | $91,298,110.38 | 40.5% |
Carson | Benjamin | Republican | $63,461,402.63 | $5,588.29 | $25,000.00 | $0.00 | $102,499.71 | $649,470.63 | $64,243,961.26 | 98.8% |
We can also change the style of string columns. For example, this is how we would change the display of the surnames:
format_dict['Surname'] = lambda x: x.upper()
contributions.head().style.format(format_dict).hide_index()
Surname | Given Name | Party | Individual contributions | Committee contributions | Candidate contributions | Transfers | Offsets | Other receipts | Total | Percent individual |
---|---|---|---|---|---|---|---|---|---|---|
CLINTON | Hillary | Democrat | $231,158,512.33 | $1,339,118.32 | $997,159.15 | $33,940,000.00 | $7,073,044.01 | $32,863.96 | $274,541,697.77 | 84.2% |
SANDERS | Bernard | Democrat | $230,670,405.61 | $5,621.92 | $0.00 | $1,500,000.00 | $3,202,486.10 | $44,028.81 | $235,422,542.44 | 98.0% |
CRUZ | Rafael | Republican | $92,111,063.05 | $101,095.14 | $0.00 | $250,012.93 | $154,093.55 | $8,086.38 | $92,624,351.05 | 99.4% |
TRUMP | Donald | Republican | $36,959,857.71 | $0.00 | $49,950,643.36 | $2,201,313.93 | $2,186,237.76 | $0.00 | $91,298,110.38 | 40.5% |
CARSON | Benjamin | Republican | $63,461,402.63 | $5,588.29 | $25,000.00 | $0.00 | $102,499.71 | $649,470.63 | $64,243,961.26 | 98.8% |
Note the surnames themselves haven't changed:
contributions.head()
Surname | Given Name | Party | Individual contributions | Committee contributions | Candidate contributions | Transfers | Offsets | Other receipts | Total | Percent individual | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Clinton | Hillary | Democrat | $231,158,512.33 | $1,339,118.32 | $997,159.15 | $33,940,000.00 | $7,073,044.01 | $32,863.96 | $274,541,697.77 | $0.84 |
1 | Sanders | Bernard | Democrat | $230,670,405.61 | $5,621.92 | $0.00 | $1,500,000.00 | $3,202,486.10 | $44,028.81 | $235,422,542.44 | $0.98 |
2 | Cruz | Rafael | Republican | $92,111,063.05 | $101,095.14 | $0.00 | $250,012.93 | $154,093.55 | $8,086.38 | $92,624,351.05 | $0.99 |
3 | Trump | Donald | Republican | $36,959,857.71 | $0.00 | $49,950,643.36 | $2,201,313.93 | $2,186,237.76 | $0.00 | $91,298,110.38 | $0.40 |
4 | Carson | Benjamin | Republican | $63,461,402.63 | $5,588.29 | $25,000.00 | $0.00 | $102,499.71 | $649,470.63 | $64,243,961.26 | $0.99 |
We can even do things like change the background color depending on the value of the cell. The syntax for this is
def some_func(value):
"""Function should return valid CSS as a string
e.g. the return string might be
'''background-color: <some CSS color name>;
color: <some CSS color name>;
font-size: <some number>pt;
'''
"""
...
Here we will implement a function, party_color
, the will return a CSS string that changes the color of the cell based on the value of the string (red
for Republican, blue
for Democrats, green
for the Greens, and grey
for everyone else).
def party_color_string(x):
if x == 'Democrat':
return 'blue'
if x == 'Republican':
return 'red'
if x == 'Green':
return 'green'
return 'grey'
def party_color(x):
return f'background-color: {party_color_string(x)}; color: white'
This function is then passed to applymap
as follows:
(contributions.head()
.style
.format(format_dict)
.applymap(party_color, subset=['Party'])
.hide_index()
)
Surname | Given Name | Party | Individual contributions | Committee contributions | Candidate contributions | Transfers | Offsets | Other receipts | Total | Percent individual |
---|---|---|---|---|---|---|---|---|---|---|
CLINTON | Hillary | Democrat | $231,158,512.33 | $1,339,118.32 | $997,159.15 | $33,940,000.00 | $7,073,044.01 | $32,863.96 | $274,541,697.77 | 84.2% |
SANDERS | Bernard | Democrat | $230,670,405.61 | $5,621.92 | $0.00 | $1,500,000.00 | $3,202,486.10 | $44,028.81 | $235,422,542.44 | 98.0% |
CRUZ | Rafael | Republican | $92,111,063.05 | $101,095.14 | $0.00 | $250,012.93 | $154,093.55 | $8,086.38 | $92,624,351.05 | 99.4% |
TRUMP | Donald | Republican | $36,959,857.71 | $0.00 | $49,950,643.36 | $2,201,313.93 | $2,186,237.76 | $0.00 | $91,298,110.38 | 40.5% |
CARSON | Benjamin | Republican | $63,461,402.63 | $5,588.29 | $25,000.00 | $0.00 | $102,499.71 | $649,470.63 | $64,243,961.26 | 98.8% |