Attribution of Responsibility and Blame Regarding a Man-made Disaster: #FlintWaterCrisis

by Talha Oz and Halil Bisgin

Presented this work in the 4th International Workshop on Social Web for Disaster Management (SWDM'16), co-located with CIKM 2016; here is the paper.

**Abstract**
Attribution of responsibility and blame are important topics in political science especially as individuals tend to think of political issues in terms of questions of responsibility, and as blame carries far more weight in voting behavior than that of credit. However, surprisingly, there is a paucity of studies on the attribution of responsibility and blame in the field of disaster research.

The Flint water crisis is a story of government failure at all levels. By studying microblog posts about it, we understand how citizens assign responsibility and blame regarding such a man-made disaster online. We form hypotheses based on social scientific theories in disaster research and then operationalize them on unobtrusive, observational social media data. In particular, we investigate the following phenomena: the source for blame; the partisan predisposition; the concerned geographies; and the contagion of complaining.

This paper adds to the sociology of disasters research by exploiting a new, rarely used data source (the social web), and by employing new computational methods (such as sentiment analysis and retrospective cohort study design) on this new form of data. In this regard, this work should be seen as the first step toward drawing more challenging inferences on the sociology of disasters from "big social data".

In [ ]:
# read the JSON data and save it to Flint.pkl once,
# whenever want to read the data, read the pickle,
# instead of the raw JSON files. 
# This code block is here just to show how we created the pickle (.pkl) file.
import pandas as pd
import json
from glob import glob
from datetime import datetime
tw = []
for f in glob("data/TweetCollection/*.json"):
    with open(f, 'r',encoding='utf-8') as fin:
        for line in fin:
            a = json.loads(line)
            tw.append({'id':a['id_str'],
                       'created_at':datetime.strptime(a['created_at'],'%a %b %d %H:%M:%S +0000 %Y'),
                       'hashtagged':any(['flintwatercrisis' in h['text'].lower() for h in a['entities']['hashtags']]),
                       'screen_name':a['user']['screen_name'],
                       'location':a['user']['location'],
                       'followers':a['user']['followers_count'],
                       'verified':bool(a['user']['verified']),
                       'text':a['text']})
df = pd.DataFrame(tw).set_index('id').drop_duplicates()
#df.to_pickle('data/Flint.pkl')
In [1]:
import pandas as pd
import numpy as np
pd.set_option('max_colwidth',200)
df = pd.read_pickle('../data/Flint.pkl')

from utilities.geocoder import Geocoder
gc = Geocoder('utilities/geodata/state_abbr_file', 'utilities/geodata/city_file')
df['latlon'] = df.location.str.strip().apply(gc.geocode)

from IPython.display import HTML
HTML(df.head().to_html(index=False)) #how the data looks like
Out[1]:
created_at followers hashtagged location screen_name text verified latlon
2016-01-15 21:00:24 265 True Sugar Land, Texas zachsciba RT @TheDailyShow: #FlintWaterCrisis could have been prevented by an easy $100/day solution. https://t.co/4Jf7oH20EX https://t.co/7fLogvuwrx False (29.599580, -95.614089)
2016-01-15 21:00:07 968 True None scootey You can thank the Republican party for this #Michigan #FlintWaterCrisis #GOP #Uniteblue https://t.co/wK7IFvkk8k False None
2016-01-15 21:00:30 189 True s. pasadena,ca steve1204 RT @TheDailyShow: #FlintWaterCrisis could have been prevented by an easy $100/day solution. https://t.co/4Jf7oH20EX https://t.co/7fLogvuwrx False (34.112958, -118.155778)
2016-01-15 21:00:09 8053 True Lansing, Michigan ProgressMich Snyder still won’t say when he knew about #FlintWaterCrisis. Protest with us on Tuesday to demand answers: https://t.co/aRfLc99QUy #MISOTS False (42.717585, -84.554916)
2016-01-15 21:00:35 7 True None marcgilbert77 RT @TheDailyShow: #FlintWaterCrisis could have been prevented by an easy $100/day solution. https://t.co/4Jf7oH20EX https://t.co/7fLogvuwrx False None
In [2]:
g = df.groupby('text').size().reset_index()
g.columns = ['text','cnt']
g = g.sort_values('cnt',ascending=False)
print('total tw:',len(df),'\nunique tw:',len(g))
g.head() #most popular tweets
total tw: 664775 
unique tw: 344384
Out[2]:
text cnt
274256 RT @xoShakarra: Friendly reminder that it STILL takes one hour and 23 gallons of water to take a bath in Flint. #FlintWaterCrisis https://t… 7093
202990 RT @BernieSanders: How do we have so much money to go to war in Iraq but somehow not enough money to provide clean drinking water to Flint?… 5825
261826 RT @markmobility: #FlintWaterCrisis \n- 99,000 residents\n- 57% Black\n- 40% Poor\n- 9,000 kids with lead poisoning\nFlint HOSPITAL Water: https… 4710
265544 RT @opinionatedcxnt: Saw this on Tumblr & it made me cringe. The Flint crisis is a horrific nightmare\nhttps://t.co/j6sT5c5p3O 2354
204672 RT @BuzzFeedVideo: People See What Flint Water Looks Like\nhttps://t.co/3fV2EZFz21 1950
In [15]:
# the original dates are in UTC/GMT, convert them to EST.
# also, as given in footnote #4, report the missing dates

import pytz
eastern = pytz.timezone('US/Eastern')
# group tweets by day
df.created_at = df.created_at.dt.tz_localize(pytz.utc).dt.tz_convert(eastern)

# print missing date intervals in our dataset
day = df.groupby(df.created_at.dt.strftime('%m-%d'))['created_at'].count()
days = day.index.tolist()
for i in range(len(days)-1):
    m1,d1 = days[i].split('-')
    m2,d2 = days[i+1].split('-')
    if m1 == m2:
        if int(d1) == int(d2) - 1:
            continue
    else:
        if d2 == '01':
            continue
    print('('+days[i]+','+days[i+1]+')',end=' ')
(01-22,01-25) (02-13,02-15) (02-16,02-20) (02-20,02-29) (02-29,03-03) (04-27,05-04) (05-06,05-08) (05-12,05-26) 
In [210]:
#Figure 1
import matplotlib.pyplot as plt
import matplotlib
%matplotlib inline
import seaborn as sns
matplotlib.style.use('fivethirtyeight')
matplotlib.style.use('ggplot')
plt.rcParams['axes.facecolor']='w'
plt.rcParams['savefig.facecolor']='w'
matplotlib.rcParams['font.size'] = 14
#plot daily activity
ax = day.plot(kind="bar",figsize=(18, 4)) #,title='#FlintWaterCrisis Activity on Twitter'
#ax.set_xlabel('Days After Flint Became a Federal State of Emergency on 2016-01-16', fontsize=14)
ax.set_ylabel('Tweets in the 1% sample', fontsize=14)

for label in ax.xaxis.get_ticklabels()[::2]:
    label.set_visible(False)
ax.annotate('Federal State of Emergency', xy=(0, 31000))
ax.annotate('Gov. Rick Snyder holds a news conf.\n'\
            'Groups file a federal lawsuit', xy=(12, 41000),ha='center')
ax.annotate('First Flint hearing in Congress\n'\
            'Hillary visits Flint', xy=(22, 9000),ha='center')
ax.annotate('GOP debate in Detroit,MI\nRubio defends MI governor', xy=(31, 23000),ha='center')
ax.annotate('DEM debate in Flint,MI\nBoth candidates calls Snyder to resign', xy=(36, 45000),ha='center')
ax.annotate('MI primaries\nfor both parties', xy=(40, 33000),ha='center')
ax.annotate('Gov Snyder & EPA admin McCarthy\ntestify before Congress', xy=(47, 11000),ha='center')
ax.annotate('A local\'s complaining tweet goes viral\nGov Snyder asks the lawsuit be dismissed', xy=(69, 9000),ha='center')
ax.annotate('Obama visits Flint', xy=(88, 10000),ha='center')
ax.set_xlim([-1, 93])
ax.set_xlabel('');
ax.get_figure().savefig('../figs/daily.pdf',dpi=150,bbox_inches='tight')
In [3]:
#Figure 2
import matplotlib.pyplot as plt
import matplotlib
%matplotlib inline
import seaborn as sns

#matplotlib.style.use('fivethirtyeight')
matplotlib.style.use('ggplot')

plt.rcParams['axes.facecolor']='w'
plt.rcParams['savefig.facecolor']='w'
matplotlib.rcParams['font.size'] = 14


l = pd.DataFrame()
for i in range(5):
    r = pd.read_csv('../data/training/Flint'+str(i+1)+'_train.csv')
    r['rater'] = i
    l = l.append(r)
l = l.fillna('missing')

l['label'] = l.c.replace({',.*':'','missing':10},regex=True).astype(int) #removes multiple labels

#get pairwise kappas
from itertools import combinations
from statsmodels.stats.inter_rater import fleiss_kappa
from statsmodels.stats.inter_rater import aggregate_raters

kappa = []
for r1,r2 in combinations(range(5), 2):
    rr = l[l.rater==r1].merge(l[l.rater==r2],on='text')[['label_x','label_y']]
    k = fleiss_kappa(aggregate_raters(rr,n_cat=11)[0])
    kappa.append(('r'+str(r1),'r'+str(r2),k))
    kappa.append(('r'+str(r2),'r'+str(r1),k)) #(r2,r1,k)

a = pd.DataFrame(kappa).pivot(0,1,2) #pairwise inter-rater fleiss-kappa 
a.index.name = None
a.columns.name = None


plt.figure(num=None, figsize=(6, 4), facecolor='w', edgecolor='w')

labels = ['No blame','MI Governor','POTUS','Flint Mayor',
          'EPA','Emergency M.','Republicans','Democrats','Government','Other indiv.', 'Unsure']
cnt = [len(l[l.c.str.contains(str(i))]) for i in range(10)] #count of each label
cnt.append(len(l[l.c.str.contains('missing')]))
ax = plt.subplot()
ax.margins(0, 0)
colors = '#777777 #E24A33 #348ABD #348ABD #348ABD #E24A33 #E24A33 #348ABD #FBC15E #8EBA42 #FFB5B8'.split()
#[color['color'] for color in list(plt.rcParams['axes.prop_cycle'])]
ax.barh(range(len(cnt)),cnt,tick_label=labels,align='center',color=colors)
#ax.set(xlabel='Manually coded tweets'); #title='Attribution of Blame/Responsibility',
#ax.grid(color='grey', linestyle='dotted', linewidth=0.5)


plt.axes([.4, .33, .55, .55])
sns.heatmap(a,annot=True,vmin=0,vmax=1,cmap='RdBu_r',annot_kws={'size':12})
#ax.get_figure().savefig('../figs/coders.pdf',dpi=150,bbox_inches='tight')
Out[3]:
<matplotlib.axes._axes.Axes at 0x126bd5d68>
In [34]:
#Table 1
df1 = pd.read_csv('../data/us-city-populations.csv',usecols=['CityST','2000','2010','LAT','LON','County_Name'])
df2 = pd.read_csv('../data/city_file.csv',dtype={'lat':str,'lon':str})
df2['CityST'] = df2.city + ', ' + df2.state
cities = df1.merge(df2, on = 'CityST', how = 'inner')
cities['latlon'] = cities[['lat','lon']].apply(tuple, axis=1)
cnt = pd.DataFrame(df.groupby(by='latlon').size().reset_index().rename(columns={0:'cnt'}))
cities = cities.merge(cnt,on='latlon',how='inner').rename(columns={'2010':'cpop','County_Name':'county'})
cities = cities[cities.cnt>=3]
cities.loc[cities.cpop.isnull(),'cpop'] = cities[cities.cpop.isnull()]['2000']
cities = cities.sort_values('cnt',ascending=False).reset_index().drop(['index','LAT','LON','2000'],1)
cities.cpop = cities.cpop.astype(int)
#cities.to_csv('data/cities.csv',index=False)
cities.head(10) #tweet counts without normalization

fil = cities[cities.cpop>88].copy()
fil['normalized'] = fil.cnt * 1000 / fil.cpop
fil = fil[fil.normalized>=1]
fil.sort_values('normalized',ascending=False).head(10) #normalized

city10 = fil.sort_values('normalized',ascending=False).head(10).reset_index()
city10 = city10.rename(columns={'CityST':'Cities'})

cofil = fil.groupby(['county','state']).sum()
cofil.normalized = cofil.cnt / np.sqrt(cofil.cpop)
county10 = cofil.sort_values(by='normalized',ascending=False).head(10).reset_index()
county10['Counties'] = county10.county +', '+county10.state

cc = pd.concat([city10.Cities,county10.Counties],axis=1)
cc.index += 1
print(cc.to_latex())
\begin{tabular}{lll}
\toprule
{} &                Cities &           Counties \\
\midrule
1  &             Flint, MI &        Genesee, MI \\
2  &           Gaylord, MI &  Dist Columbia, DC \\
3  &       Grand Blanc, MI &         Otsego, MI \\
4  &      Mount Morris, MI &          Wayne, MI \\
5  &  Bloomfield Hills, MI &         Ingham, MI \\
6  &           Lansing, MI &      Washtenaw, MI \\
7  &            Sedona, AZ &       Multiple, GA \\
8  &           Davison, MI &           Kent, MI \\
9  &     Traverse City, MI &       Coconino, AZ \\
10 &         Ann Arbor, MI &           Cook, IL \\
\bottomrule
\end{tabular}

In [4]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()

snyder = df.text.str.contains('governor|nyder|onetoughnerd',case=False)
EM = df.text.str.contains('mgr|manager|Darnell|Earley|Kurtz',case=False)
mayor = df.text.str.contains('Dayne|Walling|ayor',case=False)

print([len(df[x]) for x in [snyder,EM,mayor]]) #Footnote 10.
[97577, 6028, 11609]
In [3]:
from matplotlib import animation,font_manager
import matplotlib.pyplot as plt
from html import unescape
import os

plt.rcParams['savefig.dpi']=150
plt.rcParams['animation.html'] = 'html5'

fig, ax = plt.subplots(figsize=(6, 1))
ax.set_axis_off()

plt.subplots_adjust(left=0.1, right=0.9, top=0.9, bottom=0.1)
prop = font_manager.FontProperties(fname='Quivira.otf') # 'Symbola.ttf'
text = ax.text(.5, .5, '', fontsize=11, va='center', ha='center', wrap=True, fontproperties = prop)

txt = list(g.head(30).text) #g is a pandas dataframe

def animate(i):
    text.set_text('('+str(i+1)+') '+unescape(txt[i]))
    return (text,)

anim = animation.FuncAnimation(fig, animate, frames=len(txt), interval=2000, blit=True)
anim.save('top30.mp4') #matplotlib can save as mp4, but not as gif yet.
os.system("convert -delay 200 top30.mp4 top30.gif") #imagemagick's convert
anim #eye candy for the presentation :-)
Out[3]:
In [180]:
# Figure 3 (new)
c = pd.DataFrame()
for i in range(5):
    r = pd.read_csv('../data/training/Flint'+str(i+1)+'_train.csv')
    r['rater'] = i
    c = c.append(r)
c = c.dropna()
print(len(c[c.c.str.contains('6')]),len(c[c.c.str.contains('7')]))

r = df.screen_name[df.text.isin(c[c.c.str.contains('6')].text)]
d = df.screen_name[df.text.isin(c[c.c.str.contains('7')].text)]

m  = mayoronly[mayoronly.cmpnd!=0]
s  = snyderonly[snyderonly.cmpnd!=0]


#matplotlib.style.use('fivethirtyeight')
matplotlib.style.use('ggplot')
plt.rcParams['axes.facecolor']='w'
plt.rcParams['savefig.facecolor']='w'
matplotlib.rcParams['xtick.labelsize'] = 16
matplotlib.rcParams['ytick.labelsize'] = 16
matplotlib.rcParams['axes.titlesize'] = 18

co = {'color':'black'}
ma = {'color':'black','linestyle':'-'}
boxprops = dict(linestyle='-', color='black')

f, ax = plt.subplots(1, 2, sharey=True,figsize=(8,3))
titles = ['Governor','Mayor']
for i,a in enumerate([s,m]):
    bp = ax[i].boxplot([a[a.screen_name.isin(d)].cmpnd,a[a.screen_name.isin(r)].cmpnd], patch_artist=True,
             whiskerprops=co,capprops=co,medianprops=ma,boxprops=boxprops,labels=['Blaming R','Blaming D'])
    for box, color in zip(bp['boxes'], ['#348ABD','#E24A33']):
        box.set_color('black')
        box.set_facecolor(color)

    ax[i].set_title(titles[i],y=.9)
    ax[i].yaxis.grid(True, linestyle='-', which='major', color='lightgrey', alpha=0.5)
ax[0].set_ylabel('Sentiment score',fontsize=18);
f.savefig('../figs/box-partisanship.pdf',dpi=150,bbox_inches='tight')
62 24
In [5]:
snyderonly = df[snyder&~mayor&~EM].copy()
mayoronly=df[mayor&~snyder&~EM].copy()
a = pd.DataFrame(list(snyderonly.text.apply(sid.polarity_scores)))
snyderonly = pd.concat([snyderonly.reset_index(),a.rename(columns={'compound':'cmpnd'})],axis=1)
a = pd.DataFrame(list(mayoronly.text.apply(sid.polarity_scores)))
mayoronly = pd.concat([mayoronly.reset_index(),a.rename(columns={'compound':'cmpnd'})],axis=1)
In [130]:
from scipy.stats import ks_2samp
from math import sqrt
c_a = 1.95 #coefficient c_a is 1.36 for alpha 0.05 and 1.95 for alpha 0.001
for i,a in enumerate([s,m]):
    print(ks_2samp(a[a.screen_name.isin(d)].cmpnd,a[a.screen_name.isin(r)].cmpnd))
    n1 = len(a[a.screen_name.isin(d)])
    n2 = len(a[a.screen_name.isin(r)])
    print('Critical value D_a:',c_a*sqrt((n1+n2)/(n1*n2)))
Ks_2sampResult(statistic=0.36016301579215487, pvalue=3.4967160958746161e-06)
0.27806837264879597
Ks_2sampResult(statistic=0.33333333333333337, pvalue=0.71191004965601257)
1.0148121747397396
In [60]:
#now the contagion "experiment"
import matplotlib.pyplot as plt
import matplotlib
%matplotlib inline
import seaborn as sns
#matplotlib.style.use('fivethirtyeight')
matplotlib.style.use('ggplot')
plt.rcParams['axes.facecolor']='w'
plt.rcParams['savefig.facecolor']='w'
plt.rcParams['savefig.dpi']=227 #DPI of my 13.3 MacBook Pro Retina
f = df[df.latlon == gc.geocode("Flint, MI")].groupby('screen_name').size()
f.sort_values(ascending=False).plot(ylim=(0,30),linestyle="None",marker='.',figsize=(10,5))
f2 = list(f[(f>2)&(f<20)].index.values)
print(len(f2)) #Natural selection: Flinters who tweeted 2<x<20 times 
262
In [90]:
import twitter as t
from functools import partial
import sys, time
#friends = {}
#auth = t.oauth.OAuth("", "", "", "")
#twitter_api = t.Twitter(auth=auth)
for i,u in enumerate(f2):
    print(i+j,'trying',u)
    try:
        friends[u] = get_friends(u,twitter_api)
    except Exception as e:
        print(e)
        continue
trying EzE_2o11_
trying FGCofC
"['id_str' 'screen_name' 'name' 'location' 'description' 'created_at'\n 'friends_count' 'followers_count' 'statuses_count' 'favourites_count'] not in index"
trying FYINation
Encountered 404 Error (Not Found)
trying FaithGiddings2
trying FiggaDaKID
trying FitLifeAmber
trying Flint4Bernie
trying FlintCoalition
trying FlintDDA
trying FlintFWDProject
trying FlintHandmade
trying FlintHorrorCon
trying FlintLocal432
trying FlintMI1
trying FlintPoliceOps
Encountered 429 Error (Rate Limit Exceeded)
Retrying in 15 minutes...ZzZ...
trying FlintRoguesRFC
...ZzZ...Awake now and trying again.
trying FlintStitch
trying FlintWaterDoc
trying FlintWaterPrjct
trying Flint_NC
trying Flintrainbowmom
trying FlintstoneQue10
"['id_str' 'screen_name' 'name' 'location' 'description' 'created_at'\n 'friends_count' 'followers_count' 'statuses_count' 'favourites_count'] not in index"
trying ForgeFlint
Encountered 404 Error (Not Found)
trying FreeChoppa
trying GCHD_MI
trying GEARup2LEAD
trying GREGJOSLIN
trying Gnicole15
Twitter sent status 404 for URL: 1.1/users/lookup.json using parameters: (oauth_consumer_key=gWPAsZSr8Vff6oNEGIcZgA&oauth_nonce=1267690250305170804&oauth_signature_method=HMAC-SHA1&oauth_timestamp=1477284134&oauth_token=2184712454-gAtpJPWGyiGMjbqXz3S0ycmKqFMDPSyBAXutBub&oauth_version=1.0&user_id=%5B50896539%2C%2051228255%5D&oauth_signature=Q551rY1rRrdYXmgvYHRafGaKxqo%3D)
details: {'errors': [{'message': 'No user matches for specified terms.', 'code': 17}]}
trying GrFlintHealthCo
trying Hashtag_Flint
Encountered 429 Error (Rate Limit Exceeded)
Retrying in 15 minutes...ZzZ...
trying HipHopMarauder
...ZzZ...Awake now and trying again.
trying HoodRichCain
"['id_str' 'screen_name' 'name' 'location' 'description' 'created_at'\n 'friends_count' 'followers_count' 'statuses_count' 'favourites_count'] not in index"
trying HurleyMedical
Encountered 404 Error (Not Found)
trying ILikeFootball7
"['id_str' 'screen_name' 'name' 'location' 'description' 'created_at'\n 'friends_count' 'followers_count' 'statuses_count' 'favourites_count'] not in index"
trying ITJBrown
Encountered 404 Error (Not Found)
trying JM3_4_INT
"['id_str' 'screen_name' 'name' 'location' 'description' 'created_at'\n 'friends_count' 'followers_count' 'statuses_count' 'favourites_count'] not in index"
trying JackTriple91
Encountered 404 Error (Not Found)
trying JakeCarah
trying JamirWorld_
trying JayeMonet
trying Jaylen_22_
trying JessycaMathews
trying Jmaddy31
"['id_str' 'screen_name' 'name' 'location' 'description' 'created_at'\n 'friends_count' 'followers_count' 'statuses_count' 'favourites_count'] not in index"
trying JoeBoo3
Encountered 404 Error (Not Found)
trying JonConnorMusic
Encountered 429 Error (Rate Limit Exceeded)
Retrying in 15 minutes...ZzZ...
trying Kasey_Posa
...ZzZ...Awake now and trying again.
trying KearsleyAD
trying KennediDiane
trying KetteringU
trying KidsPriority
trying Kiki2720
trying KodaPayne
trying LGilkey3
trying LHonestAvery
trying LWVFA
trying Leonard_Solano
trying Leonspencer1
trying Live4Gr8ness
trying Lucci2x_
trying MDOC_FOA_R6
Encountered 429 Error (Rate Limit Exceeded)
Retrying in 15 minutes...ZzZ...
trying MIGlutenFreeGal
...ZzZ...Awake now and trying again.
trying MMLakers
trying MVincent810
trying MakMichigan
trying MarseilleAllen
trying MattF810
trying McLovinsTwin
trying MindonGlory
"['id_str' 'screen_name' 'name' 'location' 'description' 'created_at'\n 'friends_count' 'followers_count' 'statuses_count' 'favourites_count'] not in index"
trying MistaFLintastic
Encountered 404 Error (Not Found)
trying MonaHannaA
trying MoneyBall_Sam
"['id_str' 'screen_name' 'name' 'location' 'description' 'created_at'\n 'friends_count' 'followers_count' 'statuses_count' 'favourites_count'] not in index"
trying MoneyMusicHeart
Encountered 404 Error (Not Found)
trying Motivated_Icon
trying MrAllinger
trying NeuvooFlint
"['id_str' 'screen_name' 'name' 'location' 'description' 'created_at'\n 'friends_count' 'followers_count' 'statuses_count' 'favourites_count'] not in index"
trying NotSafeToDrink
Encountered 401 Error (Not Authorized)
Encountered 429 Error (Rate Limit Exceeded)
Retrying in 15 minutes...ZzZ...
...ZzZ...Awake now and trying again.
trying NotYourAvgSarah
trying ProjectFWC
trying QGroce
trying RandyConat
trying ReactionDJA
trying Region6PTAC
trying ReinvestFlint
trying Rev__Church_Boy
trying Rhas_Dukes
trying RickThompsonTCC
trying Rob810
trying RobinInFlint
trying RotatinMy_Tires
trying SamandFamilyy
Encountered 429 Error (Rate Limit Exceeded)
Retrying in 15 minutes...ZzZ...
trying SantinoGuerra
...ZzZ...Awake now and trying again.
trying Shemy
trying Simply_Jonny13
trying SkytzoBeatz
trying SloanMuseum
trying SlopeTastic
trying Spectacle_tv
trying Stoneywoney23
trying Supernova1177
trying TGreene32
trying THFtweets
trying TeacherBeard
trying TeamRevelationM
trying TeeTee45thST
trying ThaLadiesMan
Encountered 429 Error (Rate Limit Exceeded)
Retrying in 15 minutes...ZzZ...
trying TheMarcusJones
...ZzZ...Awake now and trying again.
trying The_X_Ray
trying ThisIsSkyy
trying ThomasJean
trying TidesOfTheSun
trying TreTaylorMusic
trying TrendingFandG
"['id_str' 'screen_name' 'name' 'location' 'description' 'created_at'\n 'friends_count' 'followers_count' 'statuses_count' 'favourites_count'] not in index"
trying Trillustrator
Encountered 404 Error (Not Found)
trying Tye_Mf_Allen
trying UMFlint
trying UWGeneseeCo
trying UncleDepri
trying VideoManJamal
trying VoodooHoney
trying WCRZFM
Encountered 429 Error (Rate Limit Exceeded)
Retrying in 15 minutes...ZzZ...
trying WEYIKyle
...ZzZ...Awake now and trying again.
Encountered 404 Error (Not Found)
"['id_str' 'screen_name' 'name' 'location' 'description' 'created_at'\n 'friends_count' 'followers_count' 'statuses_count' 'favourites_count'] not in index"
trying WeAreFlint
trying Z927WDZZ
trying _AfterSummer
trying _HighlightReel2
trying _ImFvmousHoe
trying _JustSwangin
trying _RAMONYEA
trying _RJack1_
trying __DKA
trying __JayBenz
"['id_str' 'screen_name' 'name' 'location' 'description' 'created_at'\n 'friends_count' 'followers_count' 'statuses_count' 'favourites_count'] not in index"
trying __mere02
Encountered 404 Error (Not Found)
trying _itsreallyher_
trying _jayyd0ll
"['id_str' 'screen_name' 'name' 'location' 'description' 'created_at'\n 'friends_count' 'followers_count' 'statuses_count' 'favourites_count'] not in index"
trying _ruizadrianna
Encountered 404 Error (Not Found)
Encountered 429 Error (Rate Limit Exceeded)
Retrying in 15 minutes...ZzZ...
trying _tiffmonique
...ZzZ...Awake now and trying again.
trying agerald68
trying ajl18_
trying alliwant_isyou
"['id_str' 'screen_name' 'name' 'location' 'description' 'created_at'\n 'friends_count' 'followers_count' 'statuses_count' 'favourites_count'] not in index"
trying alynb76
Encountered 401 Error (Not Authorized)
trying andrewspeaight
trying arlrbrtsn
trying banana1015radio
trying bikermom2005
trying br14n_70
trying cacavaliers
trying callmetavis
trying car_rocky
trying cdayflint
trying cityofflintpr
"['id_str' 'screen_name' 'name' 'location' 'description' 'created_at'\n 'friends_count' 'followers_count' 'statuses_count' 'favourites_count'] not in index"
trying deadlocke96
Encountered 401 Error (Not Authorized)
Encountered 429 Error (Rate Limit Exceeded)
Retrying in 15 minutes...ZzZ...
...ZzZ...Awake now and trying again.
trying dessi_WORLD
trying ernestoalaniz
trying fieticeira
"['id_str' 'screen_name' 'name' 'location' 'description' 'created_at'\n 'friends_count' 'followers_count' 'statuses_count' 'favourites_count'] not in index"
trying firsttrinitymbc
Encountered 404 Error (Not Found)
trying fixflintfirst1
trying flintcitychurch
trying flintlibrary
trying flintsnewstalk
trying flinttownboy
trying fuck__Yu
trying get_it_cocoa
"['id_str' 'screen_name' 'name' 'location' 'description' 'created_at'\n 'friends_count' 'followers_count' 'statuses_count' 'favourites_count'] not in index"
trying gggouten
Encountered 401 Error (Not Authorized)
trying goodboi021
trying gr8_bambino
Encountered 429 Error (Rate Limit Exceeded)
Retrying in 15 minutes...ZzZ...
trying graysam13
...ZzZ...Awake now and trying again.
trying grigg_nancy
trying grimringler
trying ididsocnu
trying itsA_Zthang
trying itstravis
trying jackfrizza
trying janathanrobinso
trying jaymarcellus
trying jem181818
"['id_str' 'screen_name' 'name' 'location' 'description' 'created_at'\n 'friends_count' 'followers_count' 'statuses_count' 'favourites_count'] not in index"
trying jesssssicab
Encountered 404 Error (Not Found)
trying kevinastarnes
trying keysasmith
trying kjchristian
Twitter sent status 404 for URL: 1.1/users/lookup.json using parameters: (oauth_consumer_key=gWPAsZSr8Vff6oNEGIcZgA&oauth_nonce=13523515237202840127&oauth_signature_method=HMAC-SHA1&oauth_timestamp=1477294253&oauth_token=2184712454-gAtpJPWGyiGMjbqXz3S0ycmKqFMDPSyBAXutBub&oauth_version=1.0&user_id=%5B11245382%2C%2014096845%5D&oauth_signature=oz7O5iNVXt6fa1LOFBMIPuin2rM%3D)
details: {'errors': [{'message': 'No user matches for specified terms.', 'code': 17}]}
trying kurtneiswender
Encountered 429 Error (Rate Limit Exceeded)
Retrying in 15 minutes...ZzZ...
trying ladytdiva1187
...ZzZ...Awake now and trying again.
trying laylaymitchell_
trying leftinflint
trying magmagcity
trying mariano411
trying megisomeso
trying mikenstephanie
trying mona_haydar
trying mottdean
trying najladw
trying nienie_strangep
trying novaprime79
trying orochiburenso
trying overall3171
trying p85rice
Encountered 429 Error (Rate Limit Exceeded)
Retrying in 15 minutes...ZzZ...
trying pageiv
...ZzZ...Awake now and trying again.
trying patandaj
trying peoplevssnyder
trying planetofboom
trying rappolee
trying ronfonger
trying rubosuave
"['id_str' 'screen_name' 'name' 'location' 'description' 'created_at'\n 'friends_count' 'followers_count' 'statuses_count' 'favourites_count'] not in index"
trying seonthompson
Encountered 404 Error (Not Found)
trying sexynacole
trying sharrington2016
trying simply1m3
"['id_str' 'screen_name' 'name' 'location' 'description' 'created_at'\n 'friends_count' 'followers_count' 'statuses_count' 'favourites_count'] not in index"
trying smhodges4
Encountered 401 Error (Not Authorized)
trying standupflint
trying stevemintline
trying swaydalyricist
Encountered 429 Error (Rate Limit Exceeded)
Retrying in 15 minutes...ZzZ...
trying tammy_loren
...ZzZ...Awake now and trying again.
trying tdgalbraith
trying teenagev0w
trying tenacitybrewing
trying theTOMTOMSmusic
trying whitingflint
trying xojassx
trying yrnmg
trying zachizz1476
In [111]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()

ff = pd.concat(friends.values(), keys=friends.keys(),names=['flinter'])
#ff.to_csv('../data/ff.csv')
fr = ff.screen_name.unique()
sents = pd.DataFrame(list(df[df.screen_name.isin(fr)].text.apply(sid.polarity_scores)))['compound']
fs = pd.concat([df[df.screen_name.isin(fr)].screen_name.reset_index(), sents],axis=1,ignore_index=True)
fs.columns = ['twid','screen_name','cmpnd']
In [90]:
ffdf = pd.read_csv('../data/ffdf.csv')
ffdf.columns = ['screen_name'] + list(ffdf.columns[1:])
#ffdf.head() #fsent is missing
print(len(ffdf[ffdf.usent>0]),len(ffdf[ffdf.usent<0]))
101 115
In [152]:
udf = {}
for u in ffdf.index.values:
    usent = pd.DataFrame(list(df[df.screen_name == u].text.apply(sid.polarity_scores)))['compound']
    udf[u] = {'utwcnt':len(usent),'usent':usent.mean()}
usdf = pd.DataFrame.from_dict(udf,orient='index')
usdf
Out[152]:
utwcnt usent
1Goal1Passion 7 0.096586
1_namillionme37 10 -0.215550
810DIRTVILLE 4 0.000000
AD16Gaming 6 0.128550
AdamBiggers81 8 0.315200
AliciaRoose 6 0.078383
AllahnaSteve 3 -0.467833
AmandaEmeryNews 19 -0.415800
Americans4Flint 5 0.265980
AmourNiyy_ 4 0.411850
AmylHovey 8 0.001463
AnthonyWJRT 14 -0.036036
AqueousEye 6 0.552383
Area72ENT 5 0.362420
AsmoovMF 5 -0.255680
DJJayBig 5 0.123320
DaHoodsOrnament 12 -0.212075
DannyDeadhack 9 0.338244
DerekDohrman 3 -0.026333
DesireeDuell 5 0.478120
Domanyce_Da_1 16 0.069812
DortEventCenter 5 0.519600
DoubtingTomFYI 14 0.057836
DylanLuna1931 10 0.049650
EmilyDoerr 7 0.379657
EricRob229 5 -0.169580
EzE_2o11_ 3 0.031433
FYINation 10 -0.040650
FaithGiddings2 5 -0.126800
FiggaDaKID 14 -0.187557
... ... ...
mona_haydar 7 -0.082043
mottdean 4 -0.037375
najladw 4 0.075150
nienie_strangep 4 -0.157050
novaprime79 13 0.234992
orochiburenso 5 -0.125580
overall3171 8 -0.020588
p85rice 11 -0.166100
pageiv 3 0.277000
patandaj 13 -0.118046
peoplevssnyder 4 0.199250
planetofboom 19 0.049321
rappolee 9 -0.013789
ronfonger 16 -0.342650
seonthompson 3 0.743100
sexynacole 6 0.711350
sharrington2016 3 0.320367
smhodges4 9 -0.195844
standupflint 12 0.140400
stevemintline 8 0.102813
swaydalyricist 5 0.178440
tammy_loren 19 0.023889
tdgalbraith 5 0.000000
teenagev0w 3 -0.290533
tenacitybrewing 3 -0.058700
theTOMTOMSmusic 4 0.281325
whitingflint 4 0.463100
xojassx 6 -0.029650
yrnmg 3 0.323100
zachizz1476 5 0.066800

223 rows × 2 columns

In [158]:
fdf = {}
for k,v in friends.items():
    fsk = fs[fs.screen_name.isin(v.screen_name)]
    fsent = fsk.cmpnd.mean()
    fdf[k] = {'ftwcnt':len(fsk),'dfcnt':len(fsk.screen_name.unique()),'tfcnt':len(v),'fsent':fsent}
ffdf = pd.DataFrame.from_dict(fdf,orient='index')
ffdf = ffdf.join(usdf)
#ffdf.to_csv('../data/ffdf.csv')
ffdf
Out[158]:
fsent tfcnt ftwcnt dfcnt utwcnt usent
1Goal1Passion -0.118901 505 2330 124 7 0.096586
1_namillionme37 -0.154041 634 458 37 10 -0.215550
810DIRTVILLE -0.050515 987 2607 86 4 0.000000
AD16Gaming 0.000000 53 1 1 6 0.128550
AdamBiggers81 -0.082383 1327 1219 154 8 0.315200
AliciaRoose -0.096432 918 319 44 6 0.078383
AllahnaSteve -0.081462 70 50 19 3 -0.467833
AmandaEmeryNews -0.103015 468 3682 197 19 -0.415800
Americans4Flint -0.097908 354 3374 90 5 0.265980
AmourNiyy_ -0.043152 265 148 41 4 0.411850
AmylHovey -0.073058 422 2684 166 8 0.001463
AnthonyWJRT -0.127652 260 2542 94 14 -0.036036
AqueousEye -0.116753 85 514 24 6 0.552383
Area72ENT -0.137371 343 214 39 5 0.362420
AsmoovMF -0.068034 191 282 57 5 -0.255680
DJJayBig -0.045407 4900 943 343 5 0.123320
DaHoodsOrnament -0.081913 3396 12086 157 12 -0.212075
DannyDeadhack -0.060800 1078 1591 146 9 0.338244
DerekDohrman -0.097318 335 579 48 3 -0.026333
DesireeDuell -0.103413 2000 21260 364 5 0.478120
Domanyce_Da_1 -0.072672 1090 521 98 16 0.069812
DortEventCenter -0.070818 511 2093 127 5 0.519600
DoubtingTomFYI -0.189894 1107 6066 454 14 0.057836
DylanLuna1931 -0.068016 766 1890 139 10 0.049650
EmilyDoerr -0.067301 1139 3705 263 7 0.379657
EricRob229 -0.171226 205 325 18 5 -0.169580
EzE_2o11_ -0.109453 446 327 23 3 0.031433
FYINation -0.128731 51 193 25 10 -0.040650
FaithGiddings2 -0.092667 246 1734 60 5 -0.126800
FiggaDaKID -0.055347 752 600 112 14 -0.187557
... ... ... ... ... ... ...
mona_haydar -0.093202 414 1482 107 7 -0.082043
mottdean -0.097917 363 669 34 4 -0.037375
najladw -0.121085 904 1378 117 4 0.075150
nienie_strangep -0.222324 419 715 67 4 -0.157050
novaprime79 -0.291021 63 62 8 13 0.234992
orochiburenso -0.020238 344 8 7 5 -0.125580
overall3171 -0.168117 448 407 65 8 -0.020588
p85rice -0.085477 1108 305 60 11 -0.166100
pageiv -0.114743 613 3945 150 3 0.277000
patandaj -0.061967 944 803 85 13 -0.118046
peoplevssnyder -0.029003 34 749 17 4 0.199250
planetofboom -0.155146 161 383 23 19 0.049321
rappolee -0.170460 432 283 36 9 -0.013789
ronfonger -0.098598 876 6638 263 16 -0.342650
seonthompson -0.140791 1381 811 109 3 0.743100
sexynacole -0.083383 243 116 23 6 0.711350
sharrington2016 -0.059061 104 1031 33 3 0.320367
smhodges4 -0.105832 1143 6310 269 9 -0.195844
standupflint -0.086806 442 8344 285 12 0.140400
stevemintline -0.088749 270 2122 104 8 0.102813
swaydalyricist -0.137180 822 86 22 5 0.178440
tammy_loren -0.074358 257 4216 113 19 0.023889
tdgalbraith -0.113630 62 326 24 5 0.000000
teenagev0w -0.064114 748 450 30 3 -0.290533
tenacitybrewing -0.061528 1202 2968 258 3 -0.058700
theTOMTOMSmusic -0.007381 595 296 59 4 0.281325
whitingflint -0.040434 436 3165 151 4 0.463100
xojassx -0.068348 1269 972 334 6 -0.029650
yrnmg -0.097200 1146 736 132 3 0.323100
zachizz1476 0.139836 27 25 5 5 0.066800

223 rows × 6 columns

In [207]:
len(fr),df[df.screen_name.isin(fr)].screen_name.nunique()
Out[207]:
(122953, 8339)
In [205]:
print(len(ffdf[ffdf.usent>0]),len(ffdf[ffdf.usent<0]),len(ffdf[ffdf.fsent>0]),len(ffdf[ffdf.fsent<0]))
101 115 15 206
In [167]:
ffdf.corr(method='pearson') #.loc['fsent','usent'] = .16
Out[167]:
fsent tfcnt ftwcnt dfcnt utwcnt usent
fsent 1.000000 -0.041997 -0.011399 0.044034 -0.128727 0.161092
tfcnt -0.041997 1.000000 0.150289 0.588166 0.023542 -0.076660
ftwcnt -0.011399 0.150289 1.000000 0.487535 0.185268 -0.001390
dfcnt 0.044034 0.588166 0.487535 1.000000 0.094961 0.030336
utwcnt -0.128727 0.023542 0.185268 0.094961 1.000000 -0.108274
usent 0.161092 -0.076660 -0.001390 0.030336 -0.108274 1.000000
In [211]:
ffdf[ffdf.usent>0].fsent.mean(),ffdf[ffdf.usent<0].fsent.mean()
Out[211]:
(-0.074941076829484934, -0.10539607280382769)
In [228]:
colorm = dict(boxes='lightgreen', whiskers='black', medians='black', caps='black')
#ax=compare[['followers','population']].plot(kind='box', patch_artist=True, showfliers=False)
boxprops = dict(linestyle='-', color='black')

matplotlib.style.use('ggplot')
plt.rcParams['axes.facecolor']='w'
plt.rcParams['savefig.facecolor']='w'
matplotlib.rcParams['xtick.labelsize'] = 20
matplotlib.rcParams['ytick.labelsize'] = 18
matplotlib.rcParams['axes.titlesize'] = 14

co = {'color':'black'}
ma = {'color':'black','linestyle':'-'}

plt.figure(figsize=(9,3))
cohort = ffdf[ffdf.usent<0].fsent #ffdf[ffdf.fsent<0].usent 
control= ffdf[ffdf.usent>0].fsent.dropna() #ffdf[ffdf.fsent>0].usent
print(cohort.mean(),control.mean())
bp = plt.boxplot([cohort,control],patch_artist=True, showfliers=False,
                 whiskerprops=co,capprops=co,medianprops=ma,boxprops=boxprops,labels=['Friends of the cohort','Friends of the control'])

ax = plt.gca()
for patch, color in zip(bp['boxes'], ['magenta','lightgreen']):
    patch.set_facecolor(color)
ax.set_ylabel('Sentiment score',fontsize=22)
ax.set_ylim(-.23,.08)
#plt.yticks(np.arange(-.6, .6, .1))
ax.yaxis.grid(True, linestyle='-', which='major', color='lightgrey', alpha=0.5)
ax.get_figure().savefig('../figs/contagion-exp2.pdf', bbox_inches='tight')
-0.105396072804 -0.0749410768295
In [193]:
from scipy.stats import ks_2samp
from math import sqrt
c_a = 1.36 #coefficient c_a is 1.36 for alpha 0.05 and 1.95 for alpha 0.001
print(ks_2samp(ffdf[ffdf.fsent<0].usent,ffdf[ffdf.fsent>0].usent))
n1 = len(ffdf[ffdf.fsent<0])
n2 = len(ffdf[ffdf.fsent>0])
print('Critical value D_a (ks statistic (D) should be greater than this):',c_a*sqrt((n1+n2)/(n1*n2)))
#that is the case for 95% confidence level: https://daithiocrualaoich.github.io/kolmogorov_smirnov/
Ks_2sampResult(statistic=0.37184466019417473, pvalue=0.030545168312102647)
Critical value D_a (ks statistic (D) should be greater than this): 0.3637104720012413
In [214]:
from scipy.stats import ks_2samp
from math import sqrt
c_a = 1.36 #coefficient c_a is 1.36 for alpha 0.05 and 1.95 for alpha 0.001
print(ks_2samp(ffdf[ffdf.usent<0].fsent,ffdf[ffdf.usent>0].fsent))
n1 = len(ffdf[ffdf.usent<0])
n2 = len(ffdf[ffdf.usent>0])
print('Critical value D_a (ks statistic (D) should be greater than this):',c_a*sqrt((n1+n2)/(n1*n2)))
#that is the case for 95% confidence level: https://daithiocrualaoich.github.io/kolmogorov_smirnov/
Ks_2sampResult(statistic=0.20619888075764092, pvalue=0.01743249763074449)
Critical value D_a (ks statistic (D) should be greater than this): 0.1854625286897552
In [66]:
def get_friends(screen_name,twitter_api,limit=5000):
    get_followers_ids = partial(make_twitter_request,twitter_api.friends.ids, count=5000)
    ids = []

    cursor = -1
    while cursor != 0:
        response = get_followers_ids(screen_name=screen_name, cursor=cursor)

        if response is not None:
            ids += response['ids']
            cursor = response['next_cursor']

        # print('Fetched {0} total {1} ids for {2}. next_cursor: {3}'.format(
        #    len(ids), label, screen_name, cursor))

        if len(ids) >= limit or response is None:
            break

    return ids_to_snames(twitter_api,ids[:limit],screen_name=screen_name)


def ids_to_snames(twitter_api,fids,screen_name='tozcss'):
    get_snames = partial(make_twitter_request,twitter_api.users.lookup)
    resp = []
    for i in range(1+(len(fids)-1)//100):
        resp.extend(get_snames(user_id=fids[100*i:100*(i+1)]))
    header = ['id_str','screen_name', 'name', 'location', 'description', 'created_at', \
        'friends_count','followers_count','statuses_count','favourites_count']
    df = pd.DataFrame.from_dict(resp)[header].set_index('id_str')
    return df

def make_twitter_request(twitter_api_func, max_errors=10, *args, **kw):

    # A nested helper function that handles common HTTPErrors. Return an updated
    # value for wait_period if the problem is a 500 level error. Block until the
    # rate limit is reset if it's a rate limiting issue (429 error). Returns None
    # for 401 and 404 errors, which requires special handling by the caller.
    def handle_twitter_http_error(e, wait_period=2, sleep_when_rate_limited=True):

        if wait_period > 3600: # Seconds
            print ('Too many retries. Quitting.',file=sys.stderr)
            raise e

        # See https://dev.twitter.com/docs/error-codes-responses for common codes

        if e.e.code == 401:
            print ('Encountered 401 Error (Not Authorized)',file=sys.stderr)
            return None
        elif e.e.code == 404:
            print ('Encountered 404 Error (Not Found)',file=sys.stderr)
            return None
        elif e.e.code == 429:
            print ('Encountered 429 Error (Rate Limit Exceeded)',file=sys.stderr)
            if sleep_when_rate_limited:
                print ("Retrying in 15 minutes...ZzZ...",file=sys.stderr)
                sys.stderr.flush()
                time.sleep(60*15 + 5)
                print ('...ZzZ...Awake now and trying again.',file=sys.stderr)
                return 2
            else:
                raise e # Caller must handle the rate limiting issue
        elif e.e.code in (500, 502, 503, 504):
            print ('Encountered',e.e.code,'Error. Retrying in',wait_period,'seconds',file=sys.stderr)
            time.sleep(wait_period)
            wait_period *= 1.5
            return wait_period
        else:
            raise e

    # End of nested helper function

    wait_period = 2
    error_count = 0

    while True:
        try:
            return twitter_api_func(*args, **kw)
        except t.api.TwitterHTTPError as e:
            error_count = 0
            wait_period = handle_twitter_http_error(e, wait_period)
            if wait_period is None:
                return
        except URLError as e:
            error_count += 1
            time.sleep(wait_period)
            wait_period *= 1.5
            print ("URLError encountered. Continuing.",file = sys.stderr)
            if error_count > max_errors:
                print ("Too many consecutive errors...bailing out.",file=sys.stderr)
                raise
        except BadStatusLine as e:
            error_count += 1
            time.sleep(wait_period)
            wait_period *= 1.5
            print ("BadStatusLine encountered. Continuing.",file=sys.stderr)
            if error_count > max_errors:
                print ("Too many consecutive errors...bailing out.",file=sys.stderr)
                raise
In [56]:
import subprocess #the table that went into the presentation
template = r'''\documentclass[preview]{{standalone}}
\usepackage{{booktabs}}
\usepackage[vcentering,dvips]{{geometry}}
\geometry{{total={{3.05in}}}}
\begin{{document}}
{}
\end{{document}}
'''
filename="../figs/concerned_geo.tex"
with open(filename, 'w') as f:
    f.write(template.format(cc.to_latex()))
subprocess.call(['pdflatex', filename],cwd=r'../figs');

The rest is not used in the paper

In [160]:
# Figure 4 of version1
compare = pd.read_table('../data/popVSfollower1000.txt',header=0, sep="\t")
colorm = dict(boxes='lightgreen', whiskers='black', medians='black', caps='black')
#ax=compare[['followers','population']].plot(kind='box', patch_artist=True, showfliers=False)
boxprops = dict(linestyle='-', color='black')

matplotlib.style.use('ggplot')
plt.rcParams['axes.facecolor']='w'
plt.rcParams['savefig.facecolor']='w'
matplotlib.rcParams['xtick.labelsize'] = 20
matplotlib.rcParams['ytick.labelsize'] = 18
matplotlib.rcParams['axes.titlesize'] = 14

co = {'color':'black'}
ma = {'color':'black','linestyle':'-'}

plt.figure(figsize=(9,3))
bp = plt.boxplot([compare.followers,compare.population],patch_artist=True, showfliers=False,
                 whiskerprops=co,capprops=co,medianprops=ma,boxprops=boxprops,labels=['mayor','governor'])

ax = plt.gca()
for patch, color in zip(bp['boxes'], ['magenta','lightgreen']):
    patch.set_facecolor(color)
ax.xaxis.set_ticklabels(['Cohort','Control'])
ax.set_ylabel('Sentiment score',fontsize=22)
#ax.set_ylim([-.23, -.12])
ax.yaxis.grid(True, linestyle='-', which='major', color='lightgrey', alpha=0.5)
ax.get_figure().savefig('../figs/contagion-exp.pdf', bbox_inches='tight')
In [214]:
# Figure 3
mayonly_avgsent=pd.DataFrame(mayoronly.groupby(['screen_name'],as_index=False).mean()['sent'])
snyderonly_avgsent=pd.DataFrame(snyderonly.groupby(['screen_name'],as_index=False).mean()['sent'])


print(mayonly_avgsent.sent.mean())
print(snyderonly_avgsent.sent.mean())

# fig, axes = plt.subplots(nrows=1, ncols=2, sharey=True)
colorm = dict(boxes='magenta', whiskers='black', medians='black', caps='black')
colorg = dict(boxes='lightgreen', whiskers='black', medians='black', caps='black')

#matplotlib.style.use('fivethirtyeight')
matplotlib.style.use('ggplot')
plt.rcParams['axes.facecolor']='w'
plt.rcParams['savefig.facecolor']='w'
matplotlib.rcParams['font.size'] = 14
#plt.figure(num=None, figsize=(12, 8), facecolor='w', edgecolor='w')

c = {'color':'black'}
m = {'color':'black','linestyle':'-'}
boxprops = dict(linestyle='-', color='black')
bp = plt.boxplot([mayonly_avgsent,snyderonly_avgsent], patch_artist=True,
                 whiskerprops=c,capprops=c,medianprops=m,boxprops=boxprops,labels=['Mayor','Governor'])

for patch, color in zip(bp['boxes'], ['#348ABD','#E24A33']):
    patch.set_facecolor(color)

ax = plt.gca()
ax.set_ylabel('Sentiment score')
ax.yaxis.grid(True, linestyle='-', which='major', color='lightgrey', alpha=0.5)
ax.get_figure().savefig('../figs/box-mayor-gov.pdf',dpi=150,bbox_inches='tight')
-0.122567031844
-0.312806395394
In [217]:
# Figure 4
mayonly_avgsent=pd.DataFrame(mayoronly.groupby(['screen_name'],as_index=False)['sent'].mean())
snyderonly_avgsent=pd.DataFrame(snyderonly.groupby(['screen_name'],as_index=False)['sent'].mean())

pro_may_avgent = mayonly_avgsent[mayonly_avgsent.sent>0].screen_name.unique()
comment_both = snyderonly_avgsent[snyderonly_avgsent.screen_name.isin(pro_may_avgent)].screen_name.unique()

ax=mayonly_avgsent[mayonly_avgsent.screen_name.isin(comment_both) & mayonly_avgsent.sent!=0].sent.plot(kind='density', xlim=(-1,1),color='#348ABD')
snyderonly_avgsent[snyderonly_avgsent.screen_name.isin(comment_both) & snyderonly_avgsent.sent!=0 ].sent.plot(kind='density', ax=ax, xlim=(-1,1), color = '#E24A33')
ax.legend(['Mayor','Governor'],loc=2)
ax.set_xlabel('Sentiment score')
ax.get_figure().savefig('../figs/pro_mayors_gov.pdf', bbox_inches='tight')
In [137]:
plt.figure(num=None, figsize=(12, 6), facecolor='w', edgecolor='w')
fil.cnt.plot(loglog=True,linestyle='',marker='.')
fil.cpop.plot(loglog=True,linestyle='',marker='.')
#fil.normalized.plot(loglog=True,linestyle='',marker='.')
plt.legend(['tweet count','city population']);