Created by: SmirkyGraphs. Code: GitHub. Source: BOE.

Gina Donation Q3 2017¶

Index¶

Breakdown
Data Importing & Exploring
When Donations Occured
What Was Donated
Where Donations Came From
Where RI Donations Came From
In State vs. Out of State
Where People Worked
Who Donated
Donations 1k and Over
When Out of State passes In State
Top 4 Most Frequent Donations

Breakdown¶

General Info¶

$525,881.75 was raised from 992 contributions from 29 different states

When Donations Occured¶

More donations later in the quarter 55% of the people donated in September
Only 14% of donations were in the first month of the quarter
Most of the money was raised outside of weekends, Sunday being the lowest and Tuesday being the highest

What Was Donated¶

53% (525) were paid by Credit/Debit
45% (448) were paid by Check
992 donations from 904 doners
525881.75 was raised
Average was 530
Highest was 1174.25, lowest was 1
Top 5 most frequent in order were: 1000, 500, 250, 25, 10

Where Donations Came From¶

29 diffrent States
225 different Cities
Top 5 States in order by value: RI, NY, MA, CT, CO
Top 5 Cities in order by value: Providence, New York, Barrington, Jamestown, Denver

Where RI Donations Came From¶

This looks specifically at donations where the person was living in Rhode Island

600 donations from RI
Donations from 36 of the 39 municipalities in RI
Top 5 cities/towns in order: Providence, Barrington, Jamestown, East Greenwich, Cranston
Providence made up 29% of the doners from RI
Providence made up 29% of the total donated from RI
Counties in order by number of doners: Providence, Washington, Newport, Kent, Bristol
Counties in order by sum donated: Providence, Newport, Washington, Bristol, Kent

In State vs. Out of State¶

Comparing donations based on whether they live in RI or not

60% (600) of donations were from Rhode Island
40% (392) of donations were from another state
47% (248681.22) of money donated was from Rhode Islanders
53% (277200.53) of money donated was from out of state
Average in state 414
Average out of state 707

Where People Worked¶

489 unique Employers
65,309 raised from Retirees
38,680 raised from Homemakers
16,731 from Self-Employed
15,261 from "Info Requested" (Left Empty)
Top 5 Companies: RI Medical Imaging, Pfizer Inc, General Dynamics, Citizens Bank, Pannone Lopes & Devereaux & West LLC
All Values Included
- 56% (557) Worked In RI
- 44% (435) Worked Outside RI
- 44% (230305.00) Of the Money came from people who work in RI
- 56% (295576.75) Of the Money came from people who work out of state
Extras Removed
- 55% (541) Worked in RI
- 45% (434) Worked Outside RI

Who Donated¶

97% of Donations came from Individuals
904 Unique Doners
8 Interest, 7 PAC, 5 Party Donations
Top 5 first names: David, Michael, Susan, William, Robert
Top 5 last names: Ardaya, Kelly, Richardson, Sipprelle, Rogers

Donations 1k and Over¶

In state 37% out of state 63%

Data Importing & Exploring¶

In [1]:

# For data
import pandas as pd
from pandas import Series,DataFrame
import numpy as np

# For visualization
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('darkgrid')
%matplotlib inline
sns.set()
import datetime

my_color = sns.color_palette()

In [2]:

# loading the data
df = pd.read_csv("gina.csv", parse_dates=['receipt_dt'])
# removing personal address
df = df.drop(['address'], axis=1)

In [3]:

# Preview
df.head()

Out[3]:

	contbr_nm	first_nm	last_nm	tran_type	contb_type	receipt_dt	contb_amt	city	state	zip	employer	employ_address	employ_city	employ_state	employ_zip	weekday
0	Ingrid Ardaya	Ingrid	Ardaya	Credit/Debit	Individual	2017-07-01	5.0	Providence	RI	2906	Disabled	11 North Avenue	Providence	RI	2906	Saturday
1	Ingrid Ardaya	Ingrid	Ardaya	Credit/Debit	Individual	2017-07-01	5.0	Providence	RI	2906	Disabled	11 North Avenue	Providence	RI	2906	Saturday
2	Edna Panaggio	Edna	Panaggio	Credit/Debit	Individual	2017-07-01	5.0	Cranston	RI	02920-4529	Homemaker	200 Hoffman Ave	Cranston	RI	02920-4529	Saturday
3	Eve Savitzky	Eve	Savitzky	Credit/Debit	Individual	2017-07-01	25.0	Providence	RI	2906	Homemaker	21 Lincoln Ave	Providence	RI	2906	Saturday
4	Anna Siegler	Anna	Siegler	Credit/Debit	Individual	2017-07-02	50.0	Chicago	IL	60637	Retired	5715 S. Kenwood Ave, Apt 4N	Chicago	IL	60637	Sunday

In [4]:

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 992 entries, 0 to 991
Data columns (total 16 columns):
contbr_nm         992 non-null object
first_nm          971 non-null object
last_nm           971 non-null object
tran_type         992 non-null object
contb_type        992 non-null object
receipt_dt        992 non-null datetime64[ns]
contb_amt         992 non-null float64
city              991 non-null object
state             991 non-null object
zip               988 non-null object
employer          965 non-null object
employ_address    930 non-null object
employ_city       931 non-null object
employ_state      931 non-null object
employ_zip        916 non-null object
weekday           992 non-null object
dtypes: datetime64[ns](1), float64(1), object(14)
memory usage: 124.1+ KB

In [5]:

df.shape

Out[5]:

(992, 16)

In [6]:

df.contb_amt.sum()

Out[6]:

525881.75

There are 992 Donations in Q3 making a total of $525,881.75
Those missing First/Last name are PAC's/Party donations

When Donations Occured¶

lets start by how much was raised each day

In [7]:

date_df = df.groupby(['receipt_dt'],as_index=False).sum()
date_df.plot('receipt_dt','contb_amt',figsize=(12,6),marker='', legend=False,
             linestyle='-',color='purple', xlim=('2017-07-01','2017-10-01'))

Out[7]:

<matplotlib.axes._subplots.AxesSubplot at 0x30e958cfd0>

In [8]:

# Top 5 Days
date_df.sort_values(by='contb_amt',ascending=False).head()

Out[8]:

	receipt_dt	contb_amt
79	2017-09-27	26710.0
77	2017-09-25	23230.0
49	2017-08-28	22805.0
50	2017-08-29	20925.0
52	2017-08-31	20032.6

In [9]:

count_df = df.groupby(['receipt_dt'],as_index=False).count()
count_df.plot('receipt_dt','contb_amt',figsize=(12,6),marker='',linestyle='-',color='purple', xlim=('2017-07-01','2017-10-01'))

Out[9]:

<matplotlib.axes._subplots.AxesSubplot at 0x30e9688d68>

In [10]:

mean_df = df.groupby(['receipt_dt'],as_index=False).mean()
mean_df.plot('receipt_dt','contb_amt',figsize=(12,6),marker='',linestyle='-',color='purple', xlim=('2017-07-01','2017-10-01'))

Out[10]:

<matplotlib.axes._subplots.AxesSubplot at 0x30ea78b358>

In [11]:

weekday_df = df
weekday_df['weekday'] = pd.Categorical(weekday_df['weekday'], 
        categories=['Monday','Tuesday','Wednesday','Thursday',
                    'Friday','Saturday', 'Sunday'], ordered=True)

In [12]:

weekday_df_sum = weekday_df.pivot_table(index=weekday_df['weekday'], values='contb_amt', 
                 aggfunc='sum').plot(kind='bar',rot=0,legend=False, title='Sum of Donations by Weekday')

In [13]:

weekday_df_count = weekday_df.pivot_table(index=weekday_df['weekday'], values='contb_amt', 
                    aggfunc='count').plot(kind='bar',rot=0, legend=False, title='Count of Donations by Weekday')

In [14]:

weekday_df_sum = weekday_df.pivot_table(index=weekday_df['weekday'], values='contb_amt', 
                    aggfunc='mean').plot(kind='bar',rot=0, legend=False, title='Average Donated by Weekday')

In [15]:

df['month'] = df['receipt_dt'].dt.month

In [16]:

df['receipt_dt'].dt.month.value_counts()

Out[16]:

9    549
8    306
7    137
Name: receipt_dt, dtype: int64

In [17]:

month_sum = df.pivot_table(index=df['month'], values='contb_amt', 
                 aggfunc='sum').plot(kind='bar',rot=0,legend=False, title='Sum of Donations by Month')

In [18]:

month_sum = df.pivot_table(index=df['month'], values='contb_amt', 
                 aggfunc='count').plot(kind='bar',rot=0,legend=False, title='Count of Donations by Month')

## What the Donations Were

In [19]:

df.tran_type.value_counts()

Out[19]:

Credit/Debit    525
Check           448
In-Kind          10
Other             9
Name: tran_type, dtype: int64

In [20]:

df.tran_type.value_counts(normalize=True)

Out[20]:

Credit/Debit    0.529234
Check           0.451613
In-Kind         0.010081
Other           0.009073
Name: tran_type, dtype: float64

In [21]:

sns.factorplot('tran_type',data=df,kind="count")

Out[21]:

<seaborn.axisgrid.FacetGrid at 0x30eace8c18>

In [22]:

df['contb_amt'].sum()

Out[22]:

525881.75

In [23]:

df['contb_amt'].mode()

Out[23]:

0    1000.0
dtype: float64

In [24]:

df['contb_amt'].describe()

Out[24]:

count     992.000000
mean      530.122732
std       410.813074
min         1.000000
25%       100.000000
50%       500.000000
75%      1000.000000
max      1174.250000
Name: contb_amt, dtype: float64

I was surprised to see that the average donation was 530 compared to the presidential race when it was only 100
Lowest donation was 1 and highest was 1174

In [25]:

df['contb_amt'].hist(bins=25)

Out[25]:

<matplotlib.axes._subplots.AxesSubplot at 0x30eac7e5f8>

In [26]:

df['contb_amt'].value_counts().head()

Out[26]:

1000.0    380
500.0     140
250.0     110
25.0       63
10.0       53
Name: contb_amt, dtype: int64

Surprisingly 1000 was the most frequent donation
The top most frequent donation values were much higher then those during the presidential race

## Where Did Donations Come From?

In [27]:

# 1 State was labeled "Ri" So replace it with RI
df = df.replace(['Ri'],'RI')

In [28]:

df.state.nunique()

Out[28]:

In [29]:

df.state.value_counts()

Out[29]:

RI    600
NY     76
MA     62
CT     43
CO     29
TX     28
DC     19
CA     18
MD     17
FL     15
NJ     15
VA     10
IL      8
PA      7
OR      7
AZ      6
NH      5
WA      4
VT      4
TN      3
MI      3
NM      3
HI      2
SC      2
WI      1
AL      1
GA      1
MO      1
NC      1
Name: state, dtype: int64

In [30]:

where_sum = df.pivot_table(index=df['state'], values='contb_amt', aggfunc='sum').sort_values(
    by='contb_amt').plot(kind='barh', rot=0, legend=False, title='Total Donated by State')

In [31]:

where_count = df.pivot_table(index=df['state'], values='contb_amt', aggfunc='count').sort_values(
    by='contb_amt').plot(kind='barh', rot=0, legend=False, title='Count of Donations by State')

In [32]:

where_avg = df.pivot_table(index=df['state'], values='contb_amt', aggfunc='mean').sort_values(
    by='contb_amt').plot(kind='barh', rot=0, legend=False, title='Avg Donated by State')

In [33]:

df.city.nunique()

Out[33]:

In [34]:

# Top 5 Cities
city_df = df.pivot_table('contb_amt',index='city',aggfunc='sum')
city_df = city_df.sort_values(by="contb_amt",ascending=False)

city_df.head()

Out[34]:

	contb_amt
city
Providence	62241.43
New York	44350.00
Barrington	22470.00
Jamestown	21775.00
Denver	21000.00

In [35]:

city_sum = df.pivot_table(index=df['city'], values='contb_amt', aggfunc='sum').sort_values(
    by='contb_amt').nlargest(5, 'contb_amt').plot(kind='barh', color=my_color, legend=False, title='Total Donated')

In [36]:

city_count = df.pivot_table(index=df['city'], values='contb_amt', aggfunc='count').sort_values(
    by='contb_amt').nlargest(5, 'contb_amt').plot(kind='barh', color=my_color, legend=False, title='Num of Donations')

## RI Donations

In [37]:

# Just donations from RI
RI_df = df[df.state == 'RI']

In [38]:

RI_df.city.unique()

Out[38]:

array(['Providence', 'Cranston', 'Lincoln', 'Barrington', 'Cumberland',
       'Pascoag', 'Smithfield', 'East Greenwich', 'Johnston', 'Pawtucket',
       'Jamestown', 'Wakefield', 'Saunderstown', 'Portsmouth', 'Newport',
       'West Warwick', 'North Kingstown', 'Warwick', 'PROVIDENCE',
       'Tiverton', 'Harmony', 'Bristol', 'Warren', 'West Greenwich',
       'Westerly', 'Riverside', 'N Kingstown', 'Rumford', 'Narragansett',
       'Exeter', 'Coventry', 'East Providence', 'South Kingstown',
       'North Providence', 'Middletown', 'Charlestown', 'North Kingstownq',
       'Foster', 'Block Island', 'Scituate', 'Little Compton',
       'New Shoreham', 'Peace Dale', 'Central Falls', 'North Scituate',
       'Glocester', 'E Greenwich', 'Woonsocket', 'Albion', 'Kingston'], dtype=object)

In [39]:

# Connect small towns to the City/Town they're part of
RI_df = RI_df.replace(['Pascoag'],'Burrillville')
RI_df = RI_df.replace(['Wakefield','Kingston','Peace Dale'],'South Kingstown')
RI_df = RI_df.replace(['Saunderstown','N Kingstown','North Kingstownq'],'North Kingstown')
RI_df = RI_df.replace(['E Greenwich'],'East Greenwich')
RI_df = RI_df.replace(['PROVIDENCE'],'Providence')
RI_df = RI_df.replace(['Harmony'],'Glocester')
RI_df = RI_df.replace(['Riverside','Rumford'],'East Providence')
RI_df = RI_df.replace(['Block Island'],'New Shoreham')
RI_df = RI_df.replace(['North Scituate'],'Scituate')
RI_df = RI_df.replace(['Albion'],'Lincoln')

In [40]:

RI_df.city.nunique()

Out[40]:

In [41]:

ri_city = df.pivot_table(index=RI_df['city'], values='contb_amt', aggfunc='count').sort_values(ascending=True,
    by='contb_amt').plot(kind='barh', figsize=(12,9), legend=False, title='Num of Donations by City')

In [42]:

ri_city = df.pivot_table(index=RI_df['city'], values='contb_amt', aggfunc='sum').sort_values(ascending=True,
    by='contb_amt').plot(kind='barh', figsize=(12,9), legend=False, title='Total Donated by City')

In [43]:

RI_df['city'].value_counts().sum()

Out[43]:

In [44]:

RI_df['city'].value_counts().head()

Out[44]:

Providence        175
East Greenwich     44
Barrington         44
Jamestown          38
Cranston           35
Name: city, dtype: int64

In [45]:

RI_Sum = RI_df.pivot_table('contb_amt',index='city',aggfunc='sum')
RI_Sum = RI_Sum.sort_values(by='contb_amt', ascending=False)

RI_Sum.sum()

Out[45]:

contb_amt    248681.22
dtype: float64

In [46]:

RI_Sum

Out[46]:

	contb_amt
city
Providence	70988.43
Barrington	22470.00
Jamestown	21775.00
East Greenwich	19170.00
Cranston	13061.00
North Kingstown	9585.00
Newport	8390.00
Westerly	8305.00
Lincoln	7360.00
East Providence	7170.79
Warwick	6490.00
Narragansett	6310.00
Portsmouth	5505.00
Bristol	5380.00
South Kingstown	4935.00
Charlestown	4100.00
North Providence	3525.00
Pawtucket	3130.00
Johnston	3125.00
Exeter	3035.00
Cumberland	2500.00
Middletown	2375.00
Warren	2100.00
Scituate	1500.00
West Greenwich	1200.00
Coventry	1060.00
Foster	1000.00
Smithfield	950.00
Tiverton	550.00
West Warwick	500.00
Burrillville	500.00
Woonsocket	275.00
Glocester	225.00
Little Compton	100.00
Central Falls	25.00
New Shoreham	11.00

In [47]:

# dictionary of RI Counties
county_map = {'Barrington': 'BRISTOL',
            'Bristol': 'BRISTOL',
            'Burrillville': 'PROVIDENCE',
            'Central Falls': 'PROVIDENCE',
            'Charlestown': 'WASHINGTON',
            'Coventry': 'KENT',
            'Cranston': 'PROVIDENCE',
            'Cumberland': 'PROVIDENCE',
            'East Greenwich': 'KENT',
            'East Providence': 'PROVIDENCE',
            'Exeter': 'WASHINGTON',
            'Foster': 'PROVIDENCE',
            'Glocester': 'PROVIDENCE',
            'Hopkinton': 'WASHINGTON',
            'Jamestown': 'NEWPORT',
            'Johnston': 'PROVIDENCE',
            'Lincoln': 'PROVIDENCE',
            'Little Compton': 'NEWPORT',
            'Middletown': 'NEWPORT',
            'Narragansett': 'WASHINGTON',
            'Newport': 'NEWPORT',
            'New Shoreham': 'WASHINGTON',
            'North Kingstown': 'WASHINGTON',
            'North Providence': 'PROVIDENCE',
            'North Smithfield': 'PROVIDENCE',
            'Pawtucket': 'PROVIDENCE',
            'Portsmouth': 'NEWPORT',
            'Providence': 'PROVIDENCE',
            'Richmond': 'WASHINGTON',
            'Scituate': 'PROVIDENCE',
            'Smithfield': 'PROVIDENCE',
            'South Kingstown': 'WASHINGTON',
            'Tiverton': 'NEWPORT',
            'Warren': 'BRISTOL',
            'Warwick': 'KENT',
            'Westerly': 'WASHINGTON',
            'West Greenwich': 'KENT',
            'West Warwick': 'KENT',
            'Woonsocket': 'PROVIDENCE'}

# creating a party column and mapping party to canidate
RI_df['County'] = RI_df.city.map(county_map)

In [48]:

RI_df['County'].value_counts()

Out[48]:

PROVIDENCE    303
WASHINGTON     86
NEWPORT        77
KENT           70
BRISTOL        64
Name: County, dtype: int64

In [49]:

ri_city = df.pivot_table(index=RI_df['County'], values='contb_amt', aggfunc='count').sort_values(ascending=True,
    by='contb_amt').plot(kind='barh', legend=False, title='Num of Donations by County')

In [50]:

ri_city = df.pivot_table(index=RI_df['County'], values='contb_amt', aggfunc='sum').sort_values(ascending=True,
    by='contb_amt').plot(kind='barh', legend=False, title='Total Donated by County')

## In State vs. Out of State

In [51]:

def in_ri(state):
    if state == 'RI':
        return 'in state'
    else:
        return 'out of state'

In [52]:

df['lives'] = df['state'].apply(in_ri)

In [53]:

df_lives = df

In [54]:

df['lives'] = pd.Categorical(df['lives'], categories=['in state','out of state'], ordered=True)

In [55]:

df.head()

Out[55]:

	contbr_nm	first_nm	last_nm	tran_type	contb_type	receipt_dt	contb_amt	city	state	zip	employer	employ_address	employ_city	employ_state	employ_zip	weekday	month	lives
0	Ingrid Ardaya	Ingrid	Ardaya	Credit/Debit	Individual	2017-07-01	5.0	Providence	RI	2906	Disabled	11 North Avenue	Providence	RI	2906	Saturday	7	in state
1	Ingrid Ardaya	Ingrid	Ardaya	Credit/Debit	Individual	2017-07-01	5.0	Providence	RI	2906	Disabled	11 North Avenue	Providence	RI	2906	Saturday	7	in state
2	Edna Panaggio	Edna	Panaggio	Credit/Debit	Individual	2017-07-01	5.0	Cranston	RI	02920-4529	Homemaker	200 Hoffman Ave	Cranston	RI	02920-4529	Saturday	7	in state
3	Eve Savitzky	Eve	Savitzky	Credit/Debit	Individual	2017-07-01	25.0	Providence	RI	2906	Homemaker	21 Lincoln Ave	Providence	RI	2906	Saturday	7	in state
4	Anna Siegler	Anna	Siegler	Credit/Debit	Individual	2017-07-02	50.0	Chicago	IL	60637	Retired	5715 S. Kenwood Ave, Apt 4N	Chicago	IL	60637	Sunday	7	out of state

In [56]:

count_df = df.pivot_table(index=df['lives'], values='contb_amt', aggfunc='count').plot(kind='bar',
                        rot=0, color=my_color, legend=False, title='Count of Donation')

In [57]:

print(df['lives'].value_counts())
(df['lives'].value_counts(normalize=True))

in state        600
out of state    392
Name: lives, dtype: int64

Out[57]:

in state        0.604839
out of state    0.395161
Name: lives, dtype: float64

60% (600) Were from Rhode Island
40% (392) Were from another state

In [58]:

mean_df = df.pivot_table(index=df['lives'], values='contb_amt', aggfunc='mean').plot(kind='bar',
                        rot=0, color=my_color, legend=False, title='Average Donation')

In [59]:

sum_df = df.pivot_table(index=df['lives'], values='contb_amt', aggfunc='sum').plot(kind='bar',
                        rot=0, color=my_color, legend=False, title='Total Donated')

In [60]:

percent_df = df.pivot_table(index=df['lives'], values='contb_amt', aggfunc='sum')

In [61]:

total_sum = percent_df.contb_amt.sum()
df['lives'] = df['state'].apply(in_ri)
percent_df['Percent'] = percent_df['contb_amt'] / total_sum

In [62]:

percent_df.head()

Out[62]:

	contb_amt	Percent
lives
in state	248681.22	0.472884
out of state	277200.53	0.527116

In [63]:

mean_df = df.pivot_table(index=df['lives'], values='contb_amt', aggfunc='mean')
mean_df.head()

Out[63]:

	contb_amt
lives
in state	414.468700
out of state	707.144209

## Where People Worked

In [64]:

employer_df = df.pivot_table('contb_amt',index='employer',aggfunc='sum')

# Combining Electric Boat & Genral Dynamics
employer_df.loc['General Dynamics'] = employer_df.loc['Electric Boat Corporation'] + employer_df.loc['General Dynamics']
employer_df.drop('Electric Boat Corporation',inplace=True)

employer_df = employer_df.sort_values(by = 'contb_amt',ascending=True)

In [65]:

employer_df.count()

Out[65]:

contb_amt    489
dtype: int64

Donations from people who worked at 489 different companies, lets narrow it down to companies over $1000

In [66]:

# Getting all employer records over $1000

employer_df = employer_df[employer_df['contb_amt'] > 1000]
employer_df.plot(kind='barh',figsize=(10,16))

Out[66]:

<matplotlib.axes._subplots.AxesSubplot at 0x30eb1f4780>

In [67]:

# Graphing Only Companies
employer_df.drop('Homemaker',inplace=True)
employer_df.drop('Retired',inplace=True)
employer_df.drop('Self Employed',inplace=True)
employer_df.drop('Info Requested',inplace=True)

employer_df = employer_df.sort_values(by = 'contb_amt',ascending=True)

In [68]:

# Getting all employer records over $1000
employer_df = employer_df[employer_df['contb_amt'] > 1000]
employer_df.plot(kind='barh',figsize=(10,16))

Out[68]:

<matplotlib.axes._subplots.AxesSubplot at 0x30ecb26978>

In [69]:

def in_ri(employ_state):
    if employ_state == 'RI':
        return 'in state'
    else:
        return 'out of state'

In [70]:

df['works'] = df['employ_state'].apply(in_ri)

In [71]:

df.head()

Out[71]:

	contbr_nm	first_nm	last_nm	tran_type	contb_type	receipt_dt	contb_amt	city	state	zip	employer	employ_address	employ_city	employ_state	employ_zip	weekday	month	lives	works
0	Ingrid Ardaya	Ingrid	Ardaya	Credit/Debit	Individual	2017-07-01	5.0	Providence	RI	2906	Disabled	11 North Avenue	Providence	RI	2906	Saturday	7	in state	in state
1	Ingrid Ardaya	Ingrid	Ardaya	Credit/Debit	Individual	2017-07-01	5.0	Providence	RI	2906	Disabled	11 North Avenue	Providence	RI	2906	Saturday	7	in state	in state
2	Edna Panaggio	Edna	Panaggio	Credit/Debit	Individual	2017-07-01	5.0	Cranston	RI	02920-4529	Homemaker	200 Hoffman Ave	Cranston	RI	02920-4529	Saturday	7	in state	in state
3	Eve Savitzky	Eve	Savitzky	Credit/Debit	Individual	2017-07-01	25.0	Providence	RI	2906	Homemaker	21 Lincoln Ave	Providence	RI	2906	Saturday	7	in state	in state
4	Anna Siegler	Anna	Siegler	Credit/Debit	Individual	2017-07-02	50.0	Chicago	IL	60637	Retired	5715 S. Kenwood Ave, Apt 4N	Chicago	IL	60637	Sunday	7	out of state	out of state

In [72]:

# Including Extras
count_df = df.pivot_table(index=df['works'], values='contb_amt', aggfunc='count').plot(kind='bar',
                        rot=0, color=my_color, legend=False, title='Count of Donation')

In [73]:

df['works'].value_counts()

Out[73]:

in state        557
out of state    435
Name: works, dtype: int64

In [74]:

# Including Extras
count_df = df.pivot_table(index=df['works'], values='contb_amt', aggfunc='sum').plot(kind='bar',
                        rot=0, color=my_color, legend=False, title='Total Donated')

In [75]:

# Including Extras getting % of sum
percent_df = df.pivot_table(index=df['works'], values='contb_amt', aggfunc='sum')

total_sum = df.contb_amt.sum()
percent_df['Percent'] = percent_df['contb_amt'] / total_sum

percent_df.head()

Out[75]:

	contb_amt	Percent
works
in state	230305.00	0.437941
out of state	295576.75	0.562059

In [76]:

# Removing Extras
emp_df = df[df.employer != 'Homemaker']
emp_df = df[df.employer != 'Retired']
emp_df = df[df.employer != 'Self Employed']
emp_df = df[df.employer != 'Info Requested']
emp_df = df[df.employer != 'Disabled']

In [77]:

# Extras Removed
count_df = emp_df.pivot_table(index=df['works'], values='contb_amt', aggfunc='count').plot(kind='bar',
                        rot=0, color=my_color, legend=False, title='Count of Donation')

In [78]:

emp_df['works'].value_counts()

Out[78]:

in state        541
out of state    434
Name: works, dtype: int64

In [79]:

# Extras Removed
count_df = emp_df.pivot_table(index=df['works'], values='contb_amt', aggfunc='sum').plot(kind='bar',
                        rot=0, color=my_color, legend=False, title='Total Donated')

## Who Donated

In [80]:

df.contbr_nm.nunique()

Out[80]:

In [81]:

df.first_nm.value_counts().head()

Out[81]:

David      31
Michael    25
Susan      20
William    19
Robert     19
Name: first_nm, dtype: int64

In [82]:

df.last_nm.value_counts().head()

Out[82]:

Ardaya        16
Kelly          9
Richardson     6
Watson         5
Pande          5
Name: last_nm, dtype: int64

In [83]:

df.contb_type.value_counts()

Out[83]:

Individual              966
Interest Received         8
PAC                       6
In-Kind - Individual      5
In-Kind - Party           4
Refund/Rebate             1
In-Kind - PAC             1
Party                     1
Name: contb_type, dtype: int64

In [84]:

df.contb_type.value_counts(normalize=True)

Out[84]:

Individual              0.973790
Interest Received       0.008065
PAC                     0.006048
In-Kind - Individual    0.005040
In-Kind - Party         0.004032
Refund/Rebate           0.001008
In-Kind - PAC           0.001008
Party                   0.001008
Name: contb_type, dtype: float64

## Donations 1k and Over

In [85]:

don_1k = df[df['contb_amt'] >= 1000]

In [86]:

don_1k = df[(df['contb_amt'] >= 1000)]

In [87]:

don_1k.lives.value_counts(normalize=True)

Out[87]:

out of state    0.630208
in state        0.369792
Name: lives, dtype: float64

In [88]:

sum_df = don_1k.pivot_table(index=df_lives['lives'], values='contb_amt', aggfunc='sum').plot(kind='bar',
                        rot=0, color=my_color, legend=False, title='Total Donated')

## When Out of State Passes In State

In [89]:

don_df_100 = df[df.contb_amt <= 100]
don_df_250 = df[df.contb_amt <= 250]
don_df_350 = df[df.contb_amt <= 350]
don_df_500 = df[df.contb_amt <= 500]
don_df_750 = df[df.contb_amt <= 750]
don_df_1000 = df[df.contb_amt <= 1000]

In [90]:

# Concatinating the datasets together
frames = [don_df_100, don_df_250, don_df_350, don_df_500, don_df_750, don_df_1000]

don_concat = pd.concat(frames, keys=['100', '250', '350', '500', '750', '1000'])

# resetting the index and dropping the columns we don't need
don_concat = don_concat.reset_index()

In [91]:

# Pivoting by the amt ranges
don_concat = don_concat.pivot_table('contb_amt',index='level_0',columns = 'lives',aggfunc='sum')

In [92]:

new_index= ['100', '250', '350', '500', '750', '1000']
don_concat = don_concat.reindex(new_index)
don_concat.head()

Out[92]:

lives	in state	out of state
level_0
100	5683.68	2901.53
250	33314.22	10850.53
350	34914.22	11800.53
500	86789.22	30700.53
750	103444.22	35200.53

In [93]:

don_concat[['in state','out of state']].plot(kind='bar',figsize=(12,4))
plt.xlabel('Ammount')
locs, labels = plt.xticks()
plt.setp(labels, rotation=360)
plt.title('In State vs. Out of State')

Out[93]:

<matplotlib.text.Text at 0x30ecdef400>

## Top 4 Most Frequent Donations In State vs Out of State

In [94]:

# Top 4 Donated Values
don_25 = df[df.contb_amt == 25]
don_250 = df[df.contb_amt == 250]
don_500 = df[df.contb_amt == 500]
don_1000 = df[df.contb_amt == 1000]

In [95]:

# Concatinating the datasets together
frames = [don_25, don_250, don_500, don_1000]

don_concat = pd.concat(frames, keys=['25', '250', '500', '1000'])
#resetting the index and dropping the columns we don't need
don_concat = don_concat.reset_index()

In [96]:

# Pivoting by the amt ranges
don_concat = don_concat.pivot_table('contb_amt',index='level_0',columns = 'lives',aggfunc='sum')

In [97]:

don_concat.head()

Out[97]:

lives	in state	out of state
level_0
1000	138000.0	242000.0
25	1275.0	300.0
250	22000.0	5500.0
500	51500.0	18500.0

In [98]:

new_index= ['25', '250', '500', '1000']
don_concat = don_concat.reindex(new_index)
don_concat.head()

Out[98]:

lives	in state	out of state
level_0
25	1275.0	300.0
250	22000.0	5500.0
500	51500.0	18500.0
1000	138000.0	242000.0

In [99]:

don_concat[['in state','out of state']].plot(kind='bar',figsize=(12,4))
plt.xlabel('Ammount')
locs, labels = plt.xticks()
plt.setp(labels, rotation=360)
plt.title('In State vs. Out of State')

Out[99]:

<matplotlib.text.Text at 0x30ed5c0ba8>