Look at the default presentation of a dataframe¶

Let's load in a standard csv file and print it out using the defaults. Then we will see how we can use Panda's styling to make it easier to extract information. The dataset we are using are the campaign contributions for the 2016 presidential elections. The candidates have been sorted by the total amount of money the raised.

In [1]:

import pandas as pd

desired_cols = ['Surname', 'Given Name', 'Party', 'Individual contributions', 'Committee contributions',
               'Candidate contributions', 'Transfers', 'Offsets', 'Other receipts', 'Total', 'Percent individual']

In [2]:

SRC_URL = 'https://raw.githubusercontent.com/kiwidamien/StackedTurtles/master/content/style_jupyter/2016_contributions.csv'
contributions = pd.read_csv(SRC_URL)[desired_cols]
contributions

Out[2]:

	Surname	Given Name	Party	Individual contributions	Committee contributions	Candidate contributions	Transfers	Offsets	Other receipts	Total	Percent individual
0	Clinton	Hillary	Democrat	2.311585e+08	1339118.32	997159.15	33940000.00	7073044.01	32863.96	2.745417e+08	0.841980
1	Sanders	Bernard	Democrat	2.306704e+08	5621.92	0.00	1500000.00	3202486.10	44028.81	2.354225e+08	0.979814
2	Cruz	Rafael	Republican	9.211106e+07	101095.14	0.00	250012.93	154093.55	8086.38	9.262435e+07	0.994458
3	Trump	Donald	Republican	3.695986e+07	0.00	49950643.36	2201313.93	2186237.76	0.00	9.129811e+07	0.404826
4	Carson	Benjamin	Republican	6.346140e+07	5588.29	25000.00	0.00	102499.71	649470.63	6.424396e+07	0.987819
5	Rubio	Marco	Republican	4.536183e+07	455970.75	0.00	662431.58	456260.05	595102.60	4.753170e+07	0.954349
6	Bush	Jeb	Republican	3.358905e+07	230317.05	795703.65	0.00	750658.19	50000.00	3.541573e+07	0.948422
7	Kasich	John	Republican	1.905342e+07	275870.62	0.00	0.00	107196.83	9966.00	1.944645e+07	0.979789
8	Paul	Rand	Republican	1.022932e+07	47171.17	0.00	1735263.26	31997.57	212501.08	1.225626e+07	0.834621
9	Fiorina	Carly	Republican	1.204584e+07	20925.00	0.00	0.00	13298.08	39506.11	1.211957e+07	0.993917
10	Christie	Christopher	Republican	8.432629e+06	115738.02	0.00	0.00	193218.94	0.00	8.741586e+06	0.964657
11	Walker	Scott	Republican	8.210190e+06	53100.00	0.00	0.00	64726.19	350529.12	8.678545e+06	0.946033
12	O'Malley	Martin	Democrat	4.573815e+06	115002.01	0.00	0.00	60471.44	0.00	6.338218e+06	0.721625
13	Graham	Lindsey	Republican	3.741437e+06	71750.00	0.00	1975000.00	35792.38	8730.32	5.832710e+06	0.641458
14	Huckabee	Mike	Republican	4.273304e+06	33000.00	12000.00	0.00	8499.61	0.00	4.326804e+06	0.987635
15	Santorum	Richard	Republican	1.364537e+06	7350.00	24000.00	0.00	12978.55	75237.29	1.484103e+06	0.919436
16	Jindal	Bobby	Republican	1.432464e+06	10000.00	0.00	0.00	0.00	0.00	1.442464e+06	0.993067
17	Johnson	Gary	Libertarian	1.358671e+06	5000.00	20.00	0.00	26.40	0.00	1.363957e+06	0.996125
18	Stein	Jill	Green	6.246960e+05	0.00	40000.00	0.00	491.18	0.00	8.621115e+05	0.724612
19	Webb	James	Democrat	7.715784e+05	5000.00	0.00	0.00	250.00	0.00	7.768284e+05	0.993242
20	Pataki	George	Republican	5.103273e+05	1000.00	20000.00	0.00	15935.20	0.00	5.472626e+05	0.932509
21	Gilmore	James	Republican	1.043959e+05	2000.00	279075.00	0.00	1790.49	0.00	3.872614e+05	0.269575
22	Lessig	Lawrence	Democrat	1.818120e+05	0.00	32000.00	0.00	6443.00	0.00	2.202550e+05	0.825461

Throughout this, we will look at the changes that styling makes just to the first few rows of our dataframe.

In [3]:

contributions.head()

Out[3]:

	Surname	Given Name	Party	Individual contributions	Committee contributions	Candidate contributions	Transfers	Offsets	Other receipts	Total	Percent individual
0	Clinton	Hillary	Democrat	2.311585e+08	1339118.32	997159.15	33940000.00	7073044.01	32863.96	2.745417e+08	0.841980
1	Sanders	Bernard	Democrat	2.306704e+08	5621.92	0.00	1500000.00	3202486.10	44028.81	2.354225e+08	0.979814
2	Cruz	Rafael	Republican	9.211106e+07	101095.14	0.00	250012.93	154093.55	8086.38	9.262435e+07	0.994458
3	Trump	Donald	Republican	3.695986e+07	0.00	49950643.36	2201313.93	2186237.76	0.00	9.129811e+07	0.404826
4	Carson	Benjamin	Republican	6.346140e+07	5588.29	25000.00	0.00	102499.71	649470.63	6.424396e+07	0.987819

Adjust the floats (all)¶

We see that we have a mix of scientific notation (2.311e+08), as well as the large numbers in the Transfers column that are very difficult to read. We can ask for all floats from a dataframe to be formatted as

floating point (no exponents),
with two decimal places,
and , separators between the hundreds, thousands, millions, etc

Let's see this in action:

In [4]:

pd.options.display.float_format = '{:,.2f}'.format
contributions.head()

Out[4]:

	Surname	Given Name	Party	Individual contributions	Committee contributions	Candidate contributions	Transfers	Offsets	Other receipts	Total	Percent individual
0	Clinton	Hillary	Democrat	231,158,512.33	1,339,118.32	997,159.15	33,940,000.00	7,073,044.01	32,863.96	274,541,697.77	0.84
1	Sanders	Bernard	Democrat	230,670,405.61	5,621.92	0.00	1,500,000.00	3,202,486.10	44,028.81	235,422,542.44	0.98
2	Cruz	Rafael	Republican	92,111,063.05	101,095.14	0.00	250,012.93	154,093.55	8,086.38	92,624,351.05	0.99
3	Trump	Donald	Republican	36,959,857.71	0.00	49,950,643.36	2,201,313.93	2,186,237.76	0.00	91,298,110.38	0.40
4	Carson	Benjamin	Republican	63,461,402.63	5,588.29	25,000.00	0.00	102,499.71	649,470.63	64,243,961.26	0.99

This is now the default for all dataframes in this workbook, not just this one. This is because we have told Pandas to adjust its settings globally.

Overriding the percentages and dollars¶

We can override the global again, to put a $ sign in front of all the floats, as follows:

In [5]:

# By default, everything here is in dollars
pd.options.display.float_format = '${:,.2f}'.format

contributions.head()

Out[5]:

	Surname	Given Name	Party	Individual contributions	Committee contributions	Candidate contributions	Transfers	Offsets	Other receipts	Total	Percent individual
0	Clinton	Hillary	Democrat	$231,158,512.33	$1,339,118.32	$997,159.15	$33,940,000.00	$7,073,044.01	$32,863.96	$274,541,697.77	$0.84
1	Sanders	Bernard	Democrat	$230,670,405.61	$5,621.92	$0.00	$1,500,000.00	$3,202,486.10	$44,028.81	$235,422,542.44	$0.98
2	Cruz	Rafael	Republican	$92,111,063.05	$101,095.14	$0.00	$250,012.93	$154,093.55	$8,086.38	$92,624,351.05	$0.99
3	Trump	Donald	Republican	$36,959,857.71	$0.00	$49,950,643.36	$2,201,313.93	$2,186,237.76	$0.00	$91,298,110.38	$0.40
4	Carson	Benjamin	Republican	$63,461,402.63	$5,588.29	$25,000.00	$0.00	$102,499.71	$649,470.63	$64,243,961.26	$0.99

Note that the Percent individual also got a dollar sign. This isn't what we wanted! We can set the format column by column if we want, using the .style method:

In [6]:

format_dict = {
    'Percent individual': '{:,.2%}'
}

# Note that format dict overwrites the default format
# Also note that after `.style` we no longer have a dataframe, so
# contributions.style.format(...).head()
# will result in an error!
contributions.head().style.format(format_dict)

Out[6]:

	Surname	Given Name	Party	Individual contributions	Committee contributions	Candidate contributions	Transfers	Offsets	Other receipts	Total	Percent individual
0	Clinton	Hillary	Democrat	2.31159e+08	1.33912e+06	997159	3.394e+07	7.07304e+06	32864	2.74542e+08	84.20%
1	Sanders	Bernard	Democrat	2.3067e+08	5621.92	0	1.5e+06	3.20249e+06	44028.8	2.35423e+08	97.98%
2	Cruz	Rafael	Republican	9.21111e+07	101095	0	250013	154094	8086.38	9.26244e+07	99.45%
3	Trump	Donald	Republican	3.69599e+07	0	4.99506e+07	2.20131e+06	2.18624e+06	0	9.12981e+07	40.48%
4	Carson	Benjamin	Republican	6.34614e+07	5588.29	25000	0	102500	649471	6.4244e+07	98.78%

Note that we have lost the dollar signs, and we have scientific notation back. That is because doing .style no longer returns a dataframe (instead it returns a Styler object), and the Pandas defaults only apply to dataframes. In particular, if we try running a dataframe method, such as .head() on the styler, it won't work:

In [7]:

try:
    contributions.style.format(format_dict).head()
except AttributeError as e:
    print(f"Error: {e}")

Error: 'Styler' object has no attribute 'head'

We can check that the Pandas defaults persist by looking at a dataframe again (instead of a Styler object):

In [8]:

# If we don't pass in a format, everything goes to the default
contributions.head()

Out[8]:

	Surname	Given Name	Party	Individual contributions	Committee contributions	Candidate contributions	Transfers	Offsets	Other receipts	Total	Percent individual
0	Clinton	Hillary	Democrat	$231,158,512.33	$1,339,118.32	$997,159.15	$33,940,000.00	$7,073,044.01	$32,863.96	$274,541,697.77	$0.84
1	Sanders	Bernard	Democrat	$230,670,405.61	$5,621.92	$0.00	$1,500,000.00	$3,202,486.10	$44,028.81	$235,422,542.44	$0.98
2	Cruz	Rafael	Republican	$92,111,063.05	$101,095.14	$0.00	$250,012.93	$154,093.55	$8,086.38	$92,624,351.05	$0.99
3	Trump	Donald	Republican	$36,959,857.71	$0.00	$49,950,643.36	$2,201,313.93	$2,186,237.76	$0.00	$91,298,110.38	$0.40
4	Carson	Benjamin	Republican	$63,461,402.63	$5,588.29	$25,000.00	$0.00	$102,499.71	$649,470.63	$64,243,961.26	$0.99

We see that the dollar signs are back (as well as, sadly, on the percentage column).

If we use a format dictionary, we have to set the format on each column we want, or accept the defaults. Luckily, we have select_dtypes, which selects all columns of the type we are interested in. We can generate the dictionary with the default we want, and then override specific values.

Here is this trick in action:

In [9]:

# Here is a trick to get the default
format_dict = {col_name: '${:,.2f}' for col_name in contributions.select_dtypes(float).columns}
# .. and now overwrite the percentage column
format_dict['Percent individual'] = '{:,.1%}'

# Note the ".hide_index()" method of Styler allows us to hide
# the index (which is meaningless in this case)
contributions.head().style.format(format_dict).hide_index()

Out[9]:

Surname	Given Name	Party	Individual contributions	Committee contributions	Candidate contributions	Transfers	Offsets	Other receipts	Total	Percent individual
Clinton	Hillary	Democrat	$231,158,512.33	$1,339,118.32	$997,159.15	$33,940,000.00	$7,073,044.01	$32,863.96	$274,541,697.77	84.2%
Sanders	Bernard	Democrat	$230,670,405.61	$5,621.92	$0.00	$1,500,000.00	$3,202,486.10	$44,028.81	$235,422,542.44	98.0%
Cruz	Rafael	Republican	$92,111,063.05	$101,095.14	$0.00	$250,012.93	$154,093.55	$8,086.38	$92,624,351.05	99.4%
Trump	Donald	Republican	$36,959,857.71	$0.00	$49,950,643.36	$2,201,313.93	$2,186,237.76	$0.00	$91,298,110.38	40.5%
Carson	Benjamin	Republican	$63,461,402.63	$5,588.29	$25,000.00	$0.00	$102,499.71	$649,470.63	$64,243,961.26	98.8%

Display surnames in caps (without changing underlying data)¶

We can also change the style of string columns. For example, this is how we would change the display of the surnames:

In [10]:

format_dict['Surname'] = lambda x: x.upper()
contributions.head().style.format(format_dict).hide_index()

Out[10]:

Surname	Given Name	Party	Individual contributions	Committee contributions	Candidate contributions	Transfers	Offsets	Other receipts	Total	Percent individual
CLINTON	Hillary	Democrat	$231,158,512.33	$1,339,118.32	$997,159.15	$33,940,000.00	$7,073,044.01	$32,863.96	$274,541,697.77	84.2%
SANDERS	Bernard	Democrat	$230,670,405.61	$5,621.92	$0.00	$1,500,000.00	$3,202,486.10	$44,028.81	$235,422,542.44	98.0%
CRUZ	Rafael	Republican	$92,111,063.05	$101,095.14	$0.00	$250,012.93	$154,093.55	$8,086.38	$92,624,351.05	99.4%
TRUMP	Donald	Republican	$36,959,857.71	$0.00	$49,950,643.36	$2,201,313.93	$2,186,237.76	$0.00	$91,298,110.38	40.5%
CARSON	Benjamin	Republican	$63,461,402.63	$5,588.29	$25,000.00	$0.00	$102,499.71	$649,470.63	$64,243,961.26	98.8%

Note the surnames themselves haven't changed:

In [11]:

contributions.head()

Out[11]:

	Surname	Given Name	Party	Individual contributions	Committee contributions	Candidate contributions	Transfers	Offsets	Other receipts	Total	Percent individual
0	Clinton	Hillary	Democrat	$231,158,512.33	$1,339,118.32	$997,159.15	$33,940,000.00	$7,073,044.01	$32,863.96	$274,541,697.77	$0.84
1	Sanders	Bernard	Democrat	$230,670,405.61	$5,621.92	$0.00	$1,500,000.00	$3,202,486.10	$44,028.81	$235,422,542.44	$0.98
2	Cruz	Rafael	Republican	$92,111,063.05	$101,095.14	$0.00	$250,012.93	$154,093.55	$8,086.38	$92,624,351.05	$0.99
3	Trump	Donald	Republican	$36,959,857.71	$0.00	$49,950,643.36	$2,201,313.93	$2,186,237.76	$0.00	$91,298,110.38	$0.40
4	Carson	Benjamin	Republican	$63,461,402.63	$5,588.29	$25,000.00	$0.00	$102,499.71	$649,470.63	$64,243,961.26	$0.99

Set background color¶

We can even do things like change the background color depending on the value of the cell. The syntax for this is

def some_func(value):
    """Function should return valid CSS as a string
    e.g. the return string might be
    '''background-color: <some CSS color name>;
       color: <some CSS color name>;
       font-size: <some number>pt;
    '''
    """
    ...

Here we will implement a function, party_color, the will return a CSS string that changes the color of the cell based on the value of the string (red for Republican, blue for Democrats, green for the Greens, and grey for everyone else).

In [12]:

def party_color_string(x):
    if x == 'Democrat':
        return 'blue'
    if x == 'Republican':
        return 'red'
    if x == 'Green':
        return 'green'
    return 'grey'

def party_color(x):
    return f'background-color: {party_color_string(x)}; color: white'

This function is then passed to applymap as follows:

In [13]:

(contributions.head()
 .style
 .format(format_dict)
 .applymap(party_color, subset=['Party'])
 .hide_index()
)

Out[13]:

Surname	Given Name	Party	Individual contributions	Committee contributions	Candidate contributions	Transfers	Offsets	Other receipts	Total	Percent individual
CLINTON	Hillary	Democrat	$231,158,512.33	$1,339,118.32	$997,159.15	$33,940,000.00	$7,073,044.01	$32,863.96	$274,541,697.77	84.2%
SANDERS	Bernard	Democrat	$230,670,405.61	$5,621.92	$0.00	$1,500,000.00	$3,202,486.10	$44,028.81	$235,422,542.44	98.0%
CRUZ	Rafael	Republican	$92,111,063.05	$101,095.14	$0.00	$250,012.93	$154,093.55	$8,086.38	$92,624,351.05	99.4%
TRUMP	Donald	Republican	$36,959,857.71	$0.00	$49,950,643.36	$2,201,313.93	$2,186,237.76	$0.00	$91,298,110.38	40.5%
CARSON	Benjamin	Republican	$63,461,402.63	$5,588.29	$25,000.00	$0.00	$102,499.71	$649,470.63	$64,243,961.26	98.8%