Computational Social Science: A Brief Introduction

Cara Messina, NULab Coordinator
Alexis Yohros, Digital Teaching Integration Research Assistant
Northeastern University

Prepared for Steven Vallas' Sociology 1101 Course
September 24, 2018

Outline

  1. Small group activity
  2. Introduction to computational social science
  3. “Big Data” and Data Collection
    • What can Facebook tell you about yourself?
    • What can Google tell you about yourself?
    • Why is this useful?
  4. How will data analytics impact our future?
    • Activity tracking products for health insurance
    • Digital "citizenship" in China
  5. Examples of data analytics:
    • #WhyIStayed
    • Tracing racial bias
  6. Live Example: Using computational methods to examine school shooting statistics and narratives
  7. Ethical Implcaitions
    • Machine Learning Perpetuating Racial Bias
  8. Wrap-up: Small group discussion

Activity

Small Group Discussion:

In groups of 2-3 people, have a quick 5 minute discussion to think about some (or all) of these questions:

  • What data do you think is being collected about you?
  • Where and how do you think this data is being collected?
  • Where is it stored? Who sees it?
  • How do you image your data is being used?
  • What is your reaction to this information being collected, stored, and potentially used?

Introduction

For this short lecture, we will be discussing some methodologies in Computational Social Science and the influence of Data Analytics in Sociology and other Social Science disciplines. Computational Social Science uses computational approaches to approach social science questions; for data analytics, data is collected, stored, analyzed, and visualized to better understand patterns in human behavior and action.

Learning Outcomes:

  • Introduce data collection and analytics
  • Familiarize yourself with what Python can do
  • Explore how data is collected through applications like Facebook and Google and how these could be applied to the social sciences
  • Provide examples for how data can reinforce or challenge beliefs and narratives
  • Discuss how this data can represent patterns in human behavior and how this data can reflect problematic power structures and biases
  • Examine the moral and ethical implications of “big data”

“Big Data” and Data Collection

What do we mean when we say “Big Data?” How can “big data” be used in CSS? What can “big data” tell us about ourselves?

  • In recent years, a variety of novel digital data sources, colloquially referred to as “big data,” have taken the popular imagination by storm.
  • These data sources include, but are not limited to: digitized administrative records; activity on and contents of social media and internet platforms; and readings from sensors that track physical and environmental conditions.
  • Some have argued that such data sets have the potential to transform our understanding of human behavior and society, constituting a meta-field known as computational social science
  • Lying at the intersection of computer science, statistics and the social sciences, the emerging field of computational social science uses large-scale demographic, behavioral and network data to investigate human activity and relationships.

Facebook Ad Preferences

Facebook collects and stores all the information about you; it not only collects what you write or enter in the "about me" information, but also where you click, who you talk to, and more. This data is collected to create a profile that describes and categorizes who "you" are according to Facebook. Part of the reason this data is collected is to create targeted, personal ads.

Activity: What does Facebook tell you about yourself?

Once you have found your the categories Facebook has placed you in based on the data you collected, turn to your partner and discuss interesting, accurate, or inaccurate categories. Facebook tells you why it places you in some categories, but some information is less transparent. If you don't have a Facebook, feel free to listen in on another conversation. We will show you our results shortly.

If you have Facebook, feel free to follow along these steps to see what Facebook can tell you about yourself.

    Settings > Ads > Your information > Categories


Step 1: Settings

Step 2: Ads

Step 3: Your Ad Profile Information

Step 4: Look at Your Categories


Discussion

After following these steps and reading the results, turn to your partner and briefly discuss (2 minutes):

  • interesting & surprising (or inaccurate) results
  • where these results might have come from
  • a comparison between how you view yourself and how Facebook categorizes you

Cara's Categories According to Facebook

Alexis' Categories According to Facebook

Google Timeline

Just as Facebook keeps track of the clicks you make, Google keeps track of the steps you make (as well as many other points of data). By using Google Timeline, you can trace the routes you took. The example we are using is a trip Alexis took.

So what?

The type of data is collected and analyzed creates categories & profiles by tracing patterns through the clicks we make, the advertisements we click, the information we fill out, the content we post, etc. The ways that our information is tracked, categorized, and represented holds power in exploring our patterns in comparison to larger catgories that both Facebook and Google algorithms have created. It's crucial that we be critical of how these categories are chosen, what data influences what categories, and more.

Downloading your data

Although most data is not publically available, there are ways you can collect your own data. You can also scrape data from Twitter using Twitter APIs. If you're interested in downloading your data from Facebook and Google, follow these steps and links:

Facebook: Settings > Your Facebook Information > Download your Information

Google: https://support.google.com/accounts/answer/3024190?hl=en. Here's a shot of what the downloaded data looks like:

Ethical Implications

While this is interesting, we encourage you to also begin to see the potential problems with this. The recent Facebook controversy that Facebook sells our data makes our profiles and information currency; it also raises problems about autonomy, anonymity, and privacy. How are we being represented and to whom? In ways ways is our information being used? Thse are concerns you should not only think about in your personal lives, but in your research; how will you collect data in an ethical and responsible manner?

How can collecting data on us impact us in the future?

Activity tracker like iPhone or Apple Watch now mandatory for John Hancock life insurance

  • Policyholders score premium discounts for hitting exercise targets tracked on wearable devices such as a Fitbit or Apple Watch
  • Raised questions about whether insurers may eventually use data to select the most profitable customers
  • Will policyholders be penalised for walking through a sketchy area, logged by the GPS in their device? What about an activity tracker logging a strenuous hike as a risk factor? Or deciding that someone is cycling or skiing dangerously fast?

China is building a digital dictatorship to exert control over its 1.4 billion citizens. For some, “social credit” will bring privileges — for others, punishment.

In [1]:
from IPython.display import HTML
from IPython.display import YouTubeVideo
YouTubeVideo('eViswN602_k')
#til 2:25
Out[1]:

Examples of Data Analytics, or how to use data to inform social science work

Think about larger ideas or narratives we hear passed around social media, news outlets, conversations at dinner, and more about the state of America. Part of the goal with using data is to test these narratives and see whether or not data actually reflects what we believe, and do so in a way that are ethical, critical, and statistically sound.

#WhyIStayed: Survivors of Domestic Violence Relationships

  • Analyzing a Twitter trend to determine the reasons victims of domestic abuse give for staying in and leaving their abusive relationships.
  • Victims report staying in abusive relationships due to cognitive manipulation, as indicated by a predominance of verbs including manipulate, isolate, convince, think, believe, felt
  • report leaving when experiencing or fearing physical violence, via predicates such as kill and kick.
  • They also report staying when in dire financial straits (money), when attempting to keepthe nuclear family united (family, marriage) or when experiencing shame about their situation (ashamed,shame).
  • They report leaving when threats are made towards loved-ones (son, daughter), gain agency (choose, decide), realize their situation or self-worth (realize, learn, worth, deserve, finally, better), or gain support from friends or family (courage, support, help).

Online Dating Apps:

  • Survey data: a huge majority of users of on-line dating services would reject racists as partners
  • 84 % say “no” (as opposed to “it depends” or “yes”) when asked
  • Yet users typically show “attractiveness” ratings that value white appearance rather than black.
  • This holds for all ethnic groups save blacks
  • But it’s more pronounced among men rather than women

Big data in Healthcare: From Opiod Crsis to Tracking the Flu

  • An app called Flu Near You lets crowdsource when people feel symptoms of the flu, letting you track it in real time
  • Data scientists at Blue Cross Blue Shield using years of insurance and pharmacy data, analysts have been able to identify 742 risk factors that predict with a high degree of accuracy whether someone is at risk for abusing opioids.
  • Threshold of different factors that can establish someone as "high risk"
  • Ethical implications: privacy, labeling

Using Computational Methods to Examine School Shootings and the Media

Grounded in theory and empirical background, the research examines several research questions:

  • Has the media coverage of school shootings increased over time?
  • Has the media coverage of school shootings increased over time relative to the amount of school shootings over time?
  • Has the framing of school shootings changed over time?
    • Specifically, has the mention of mass shootings as a gun control issue versus a mental health issue increased, stayed the same, or decreased over time?

Measures of Interest:

  • Fatal School Shootings: at least 1 individual killed by firearms at a school
  • Multiple-Victim School Shooting: 4 or more victims and at least 2 killed, excluding the assailant, by firearms at a school
  • NYT Articles mentioning "school shooting" per year
  • Prop: NYT Articles mentioning "school shooting" per year // total amount of articles published per year
  • Mental Health: key words "mental health", "illness"
  • Gun Control: key words "gun control", "gun rights"

NOTE: the data about fatal school shootings is not publically available. Therefore, we only have images of the code because it can't be run.

Summary:

  • Schools are safer now than they were in the 1990s.
    • Four times the number of children were killed in schools in the early 1990s than today (Fox & Fridel 2018)
  • There is a negative correlation showing that fatal school shootings have decreased over time
  • Media has shown a different trend with an increase in articles covering school shootings over time
    • Spikes for high victim, high profile shootings
  • Mentions of mental health and gun control only happen in high profile school shootings: Sandy Hook and Columbine

Using Network Analysis to Trace Relationships

Network analysis can show relationships among people based on who they know, where they are located, and other data points. Using data on the people who belonged to different political parties during the Patriot Movement, we can better see who is at the "center" of this network.

Example provided by Laura Nelson, Assistant Professor of Sociology at Northeastern University

In [1]:
import pandas
import networkx as nx
import numpy as np
import matplotlib.pyplot as plt
import nltk
from nltk import word_tokenize
from nltk.corpus import stopwords
import string
In [2]:
df = pandas.read_csv("./data/PatriotMovementData.csv", index_col = 0)
df.head(10)
Out[2]:
StAndrewsLodge LoyalNine NorthCaucus LongRoomClub TeaParty BostonCommittee LondonEnemies
Adams.John 0 0 1 1 0 0 0
Adams.Samuel 0 0 1 1 0 1 1
Allen.Dr 0 0 1 0 0 0 0
Appleton.Nathaniel 0 0 1 0 0 1 0
Ash.Gilbert 1 0 0 0 0 0 0
Austin.Benjamin 0 0 0 0 0 0 1
Austin.Samuel 0 0 0 0 0 0 1
Avery.John 0 1 0 0 0 0 1
Baldwin.Cyrus 0 0 0 0 0 0 1
Ballard.John 0 0 1 0 0 0 0
In [3]:
#Matrix manipulation!

dfT = df.T
df_matrix = df.as_matrix()
dfT_matrix = dfT.as_matrix()
names_adj = np.dot(df_matrix, dfT_matrix)

#dictionary to label the person nodes

names = list(dfT.columns.values) #list of person names

labels_names = {}

for n in range(0,np.shape(names_adj)[0]):
    labels_names[n] = names[n]

#create graph object
G_names = nx.to_networkx_graph(names_adj,create_using=nx.DiGraph())

#add node labels
nx.relabel_nodes(G_names, labels_names,copy=False)

G_names
Out[3]:
<networkx.classes.digraph.DiGraph at 0x1a15951b38>
In [4]:
#Network statistics

betweeness_names = nx.betweenness_centrality(G_names, seed = 123)
betweeness_names
Out[4]:
{'Adams.John': 0.002973404661896023,
 'Adams.Samuel': 0.010304529532216724,
 'Allen.Dr': 0.0,
 'Appleton.Nathaniel': 0.0012845225389882728,
 'Ash.Gilbert': 0.0,
 'Austin.Benjamin': 0.0,
 'Austin.Samuel': 0.0,
 'Avery.John': 0.0032467305369798527,
 'Baldwin.Cyrus': 0.0,
 'Ballard.John': 0.0,
 'Barber.Nathaniel': 0.03263572308701835,
 'Barnard.Samuel': 0.0,
 'Barrett.Samuel': 0.026484900197685703,
 'Bass.Henry': 0.0382628231249812,
 'Bell.William': 0.0,
 'Bewer.James': 0.0,
 'Blake.Increase': 0.0,
 'Boit.John': 0.0,
 'Bolter.Thomas': 0.0,
 'Boyer.Peter': 0.0,
 'Boynton.Richard': 0.0010239405475069559,
 'Brackett.Jos': 0.0,
 'Bradford.John': 0.0010239405475069559,
 'Bradlee.David': 0.0,
 'Bradlee.Josiah': 0.0,
 'Bradlee.Nathaniel': 0.0,
 'Bradlee.Thomas': 0.0,
 'Bray.George': 0.0,
 'Breck.William': 0.0,
 'Brimmer.Herman': 0.0,
 'Brimmer.Martin': 0.0,
 'Broomfield.Henry': 0.0,
 'Brown.Enoch': 0.0,
 'Brown.Hugh': 0.0,
 'Brown.John': 0.0,
 'Bruce.Stephen': 0.0,
 'Burbeck.Edward': 0.0,
 'Burbeck.William': 0.0,
 'Burt.Benjamin': 0.0,
 'Burton.Benjamin': 0.0,
 'Cailleteau.Edward': 0.0,
 'Callendar.Elisha': 0.0,
 'Campbell.Nicholas': 0.0,
 'Cazneau.Capt': 0.0,
 'Chadwell.Mr': 0.0,
 'Champney.Caleb': 0.0,
 'Chase.Thomas': 0.0382628231249812,
 'Cheever.Ezekiel': 0.0035631475169868644,
 'Chipman.Seth': 0.0,
 'Chrysty.Thomas': 0.0,
 'Church.Benjamin': 0.010304529532216724,
 'Clarke.Benjamin': 0.0,
 'Cleverly.Stephen': 0.0,
 'Cochran.John': 0.0,
 'Colesworthy.Gilbert': 0.0,
 'Collier.Gershom': 0.0,
 'Collins.Ezra': 0.0,
 'Collson.Adam': 0.009030196989535035,
 'Condy.JamesFoster': 0.026599004548872127,
 'Cooper.Samuel': 0.012854101098996706,
 'Cooper.William': 0.0,
 'Crafts.Thomas': 0.004095472541971693,
 'Crane.John': 0.0,
 'Davis.Caleb': 0.0010239405475069559,
 'Davis.Edward': 0.0,
 'Davis.Robert': 0.0,
 'Davis.William': 0.0,
 'Dawes.Thomas': 0.0,
 'Dennie.William': 0.0012845225389882728,
 'Deshon.Moses': 0.0,
 'Dexter.Samuel': 0.0,
 'Dolbear.Edward': 0.0,
 'Doyle.Peter': 0.0,
 'Eaton.Joseph': 0.0,
 'Eayres.Joseph': 0.016090719076755012,
 'Eckley.Unknown': 0.0,
 'Edes.Benjamin': 0.00305275166979773,
 'Emmes.Samuel': 0.0,
 'Etheridge.William': 0.0,
 'Fenno.Samuel': 0.0,
 'Ferrell.Ambrose': 0.0,
 'Field.Joseph': 0.0,
 'Flagg.Josiah': 0.0,
 'Fleet.Thomas': 0.0,
 'Foster.Bos': 0.0,
 'Foster.Samuel': 0.0,
 'Frothingham.Nathaniel': 0.0,
 'Gammell.John': 0.0,
 'Gill.Moses': 0.0,
 'Gore.Samuel': 0.0,
 'Gould.William': 0.0,
 'Graham.James': 0.0,
 'Grant.Moses': 0.026599004548872127,
 'Gray.Wait': 0.0,
 'Greene.Nathaniel': 0.0,
 'Greenleaf.Joseph': 0.005138063767243812,
 'Greenleaf.William': 0.0010239405475069559,
 'Greenough.Newn': 0.0,
 'Ham.William': 0.0,
 'Hammond.Samuel': 0.0,
 'Hancock.Eben': 0.0,
 'Hancock.John': 0.0031519349925542657,
 'Hendley.William': 0.0,
 'Hewes.George': 0.0,
 'Hickling.William': 0.0,
 'Hicks.John': 0.0,
 'Hill.Alexander': 0.0,
 'Hitchborn.Nathaniel': 0.0,
 'Hitchborn.Thomas': 0.0,
 'Hobbs.Samuel': 0.0,
 'Hoffins.John': 0.0,
 'Holmes.Nathaniel': 0.0,
 'Hooton.John': 0.0,
 'Hopkins.Caleb': 0.0,
 'Hoskins.William': 0.0,
 'Howard.Samuel': 0.0,
 'Howe.Edward': 0.0,
 'Hunnewell.Jonathan': 0.0,
 'Hunnewell.Richard': 0.0,
 'Hunstable.Thomas': 0.0,
 'Hunt.Abraham': 0.0,
 'Ingersoll.Daniel': 0.0,
 'Inglish.Alexander': 0.0,
 'Isaac.Pierce': 0.0,
 'Ivers.James': 0.0,
 'Jarvis.Charles': 0.0,
 'Jarvis.Edward': 0.0,
 'Jefferds.Unknown': 0.0,
 'Jenkins.John': 0.0,
 'Johnston.Eben': 0.0,
 'Johonnott.Gabriel': 0.0,
 'Kent.Benjamin': 0.0,
 'Kerr.Walter': 0.0,
 'Kimball.Thomas': 0.0,
 'Kinnison.David': 0.0,
 'Lambert.John': 0.0,
 'Lee.Joseph': 0.0,
 'Lewis.Phillip': 0.0,
 'Lincoln.Amos': 0.0,
 'Loring.Matthew': 0.0,
 'Lowell.John': 0.0,
 'MacKintosh.Capt': 0.0,
 'MacNeil.Archibald': 0.0,
 'Machin.Thomas': 0.0,
 'Mackay.William': 0.0,
 'Marett.Phillip': 0.0,
 'Marlton.John': 0.0,
 'Marshall.Thomas': 0.0,
 'Marson.John': 0.0,
 'Mason.Jonathan': 0.0,
 'Matchett.John': 0.0,
 'May.John': 0.0,
 'McAlpine.William': 0.0,
 'Melville.Thomas': 0.0,
 'Merrit.John': 0.0,
 'Milliken.Thomas': 0.0,
 'Molineux.William': 0.01634210599069235,
 'Moody.Samuel': 0.0,
 'Moore.Thomas': 0.0,
 'Morse.Anthony': 0.0,
 'Morton.Perez': 0.0,
 'Mountford.Joseph': 0.0,
 'Newell.Eliphelet': 0.0,
 'Nicholls.Unknown': 0.0,
 'Noyces.Nat': 0.0,
 'Obear.Israel': 0.0,
 'Otis.James': 0.0006832672898402347,
 'Palfrey.William': 0.0,
 'Palmer.Joseph': 0.0,
 'Palms.Richard': 0.0,
 'Parker.Jonathan': 0.0,
 'Parkman.Elias': 0.0035631475169868644,
 'Partridge.Sam': 0.0,
 'Payson.Joseph': 0.0,
 'Pearce.Isaac': 0.0,
 'Pearce.IsaacJun': 0.0,
 'Peck.Samuel': 0.04451080601361194,
 'Peck.Thomas': 0.0,
 'Peters.John': 0.0,
 'Phillips.John': 0.0,
 'Phillips.Samuel': 0.0,
 'Phillips.William': 0.0,
 'Pierce.William': 0.0,
 'Pierpont.Robert': 0.0,
 'Pitts.John': 0.0,
 'Pitts.Lendall': 0.0,
 'Pitts.Samuel': 0.0,
 'Porter.Thomas': 0.0,
 'Potter.Edward': 0.0,
 'Powell.William': 0.0010239405475069559,
 'Prentiss.Henry': 0.0,
 'Prince.Job': 0.0,
 'Prince.John': 0.0,
 'Proctor.Edward': 0.026599004548872127,
 'Pulling.John': 0.0035631475169868644,
 'Pulling.Richard': 0.0,
 'Purkitt.Henry': 0.0,
 'Quincy.Josiah': 0.0006832672898402347,
 'Randall.John': 0.0,
 'Revere.Paul': 0.13713528688037896,
 'Roby.Joseph': 0.0,
 'Roylson.Thomas': 0.0,
 'Ruddock.Abiel': 0.0035631475169868644,
 'Russell.John': 0.0,
 'Russell.William': 0.0,
 'Sessions.Robert': 0.0,
 'Seward.James': 0.0,
 'Sharp.Gibbens': 0.0,
 'Shed.Joseph': 0.0,
 'Sigourney.John': 0.0,
 'Simpson.Benjamin': 0.0,
 'Slater.Peter': 0.0,
 'Sloper.Ambrose': 0.0,
 'Smith.John': 0.0,
 'Spear.Thomas': 0.0,
 'Sprague.Samuel': 0.0,
 'Spurr.John': 0.0,
 'Stanbridge.Henry': 0.0,
 'Starr.James': 0.0,
 'Stearns.Phineas': 0.0,
 'Stevens.Ebenezer': 0.0,
 'Stoddard.Asa': 0.0,
 'Stoddard.Jonathan': 0.0,
 'Story.Elisha': 0.009030196989535035,
 'Swan.James': 0.009030196989535035,
 'Sweetser.John': 0.0,
 'Symmes.Eben': 0.0,
 'Symmes.John': 0.0,
 'Tabor.Philip': 0.0,
 'Tileston.Thomas': 0.0,
 'Trott.George': 0.0,
 'Tyler.Royall': 0.0,
 'Urann.Thomas': 0.0743789222122532,
 'Vernon.Fortesque': 0.0,
 'Waldo.Benjamin': 0.0,
 'Warren.Joseph': 0.07077412915432525,
 'Webb.Joseph': 0.0,
 'Webster.Thomas': 0.0,
 'Welles.Henry': 0.004095472541971693,
 'Wendell.Oliver': 0.0010239405475069559,
 'Wheeler.Josiah': 0.0,
 'White.Samuel': 0.0,
 'Whitten.John': 0.0,
 'Whitwell.Samuel': 0.0,
 'Whitwell.William': 0.0,
 'Williams.Jeremiah': 0.0,
 'Williams.Jonathan': 0.0,
 'Williams.Thomas': 0.0,
 'Willis.Nathaniel': 0.0,
 'Wingfield.William': 0.0,
 'Winslow.John': 0.0,
 'Winthrop.John': 0.0035631475169868644,
 'Wyeth.Joshua': 0.0,
 'Young.Thomas': 0.01634210599069235}
In [5]:
#set figure size using matplotlib
plt.figure(figsize=(10,12))

nx.draw(G_names,
    with_labels = True,
    node_color = 'black',
    node_size = 100,
    line_color = 'grey',
    edge_color = 'grey',
    linewidths = 0,
    width = 0.1,
    scale = 16,
    font_size = 12,
    pos = nx.spring_layout(G_names, k=0.35, iterations=50, random_state=1234), 
    )
plt.show()
In [15]:
#sort by value to find the most central person.
sorted(betweeness_names, key=betweeness_names.get, reverse=True)
Out[15]:
['Revere.Paul',
 'Urann.Thomas',
 'Warren.Joseph',
 'Peck.Samuel',
 'Bass.Henry',
 'Chase.Thomas',
 'Barber.Nathaniel',
 'Condy.JamesFoster',
 'Grant.Moses',
 'Proctor.Edward',
 'Barrett.Samuel',
 'Molineux.William',
 'Young.Thomas',
 'Eayres.Joseph',
 'Cooper.Samuel',
 'Adams.Samuel',
 'Church.Benjamin',
 'Collson.Adam',
 'Story.Elisha',
 'Swan.James',
 'Greenleaf.Joseph',
 'Crafts.Thomas',
 'Welles.Henry',
 'Cheever.Ezekiel',
 'Parkman.Elias',
 'Pulling.John',
 'Ruddock.Abiel',
 'Winthrop.John',
 'Avery.John',
 'Hancock.John',
 'Edes.Benjamin',
 'Adams.John',
 'Appleton.Nathaniel',
 'Dennie.William',
 'Boynton.Richard',
 'Bradford.John',
 'Davis.Caleb',
 'Greenleaf.William',
 'Powell.William',
 'Wendell.Oliver',
 'Otis.James',
 'Quincy.Josiah',
 'Allen.Dr',
 'Ash.Gilbert',
 'Austin.Benjamin',
 'Austin.Samuel',
 'Baldwin.Cyrus',
 'Ballard.John',
 'Barnard.Samuel',
 'Bell.William',
 'Blake.Increase',
 'Boit.John',
 'Bolter.Thomas',
 'Boyer.Peter',
 'Brackett.Jos',
 'Bradlee.David',
 'Bradlee.Josiah',
 'Bradlee.Nathaniel',
 'Bradlee.Thomas',
 'Bray.George',
 'Breck.William',
 'Bewer.James',
 'Brimmer.Herman',
 'Brimmer.Martin',
 'Broomfield.Henry',
 'Brown.Hugh',
 'Brown.Enoch',
 'Brown.John',
 'Bruce.Stephen',
 'Burbeck.Edward',
 'Burbeck.William',
 'Burt.Benjamin',
 'Burton.Benjamin',
 'Cailleteau.Edward',
 'Callendar.Elisha',
 'Campbell.Nicholas',
 'Cazneau.Capt',
 'Chadwell.Mr',
 'Champney.Caleb',
 'Chipman.Seth',
 'Chrysty.Thomas',
 'Clarke.Benjamin',
 'Cleverly.Stephen',
 'Cochran.John',
 'Colesworthy.Gilbert',
 'Collier.Gershom',
 'Collins.Ezra',
 'Cooper.William',
 'Crane.John',
 'Davis.Edward',
 'Davis.Robert',
 'Davis.William',
 'Dawes.Thomas',
 'Deshon.Moses',
 'Dexter.Samuel',
 'Dolbear.Edward',
 'Doyle.Peter',
 'Eaton.Joseph',
 'Eckley.Unknown',
 'Emmes.Samuel',
 'Etheridge.William',
 'Fenno.Samuel',
 'Ferrell.Ambrose',
 'Field.Joseph',
 'Flagg.Josiah',
 'Fleet.Thomas',
 'Foster.Bos',
 'Foster.Samuel',
 'Frothingham.Nathaniel',
 'Gammell.John',
 'Gill.Moses',
 'Gore.Samuel',
 'Gould.William',
 'Graham.James',
 'Gray.Wait',
 'Greene.Nathaniel',
 'Greenough.Newn',
 'Ham.William',
 'Hammond.Samuel',
 'Hancock.Eben',
 'Hendley.William',
 'Hewes.George',
 'Hickling.William',
 'Hicks.John',
 'Hill.Alexander',
 'Hitchborn.Nathaniel',
 'Hitchborn.Thomas',
 'Hobbs.Samuel',
 'Hoffins.John',
 'Holmes.Nathaniel',
 'Hooton.John',
 'Hopkins.Caleb',
 'Hoskins.William',
 'Howard.Samuel',
 'Howe.Edward',
 'Hunnewell.Jonathan',
 'Hunnewell.Richard',
 'Hunstable.Thomas',
 'Hunt.Abraham',
 'Ingersoll.Daniel',
 'Inglish.Alexander',
 'Isaac.Pierce',
 'Ivers.James',
 'Jarvis.Edward',
 'Jarvis.Charles',
 'Jefferds.Unknown',
 'Jenkins.John',
 'Johnston.Eben',
 'Johonnott.Gabriel',
 'Kent.Benjamin',
 'Kerr.Walter',
 'Kimball.Thomas',
 'Kinnison.David',
 'Lambert.John',
 'Lee.Joseph',
 'Lewis.Phillip',
 'Lincoln.Amos',
 'Loring.Matthew',
 'Lowell.John',
 'Machin.Thomas',
 'Mackay.William',
 'MacKintosh.Capt',
 'MacNeil.Archibald',
 'Marett.Phillip',
 'Marlton.John',
 'Marshall.Thomas',
 'Marson.John',
 'Mason.Jonathan',
 'Matchett.John',
 'May.John',
 'McAlpine.William',
 'Melville.Thomas',
 'Merrit.John',
 'Milliken.Thomas',
 'Moody.Samuel',
 'Moore.Thomas',
 'Morse.Anthony',
 'Morton.Perez',
 'Mountford.Joseph',
 'Newell.Eliphelet',
 'Nicholls.Unknown',
 'Noyces.Nat',
 'Obear.Israel',
 'Palfrey.William',
 'Palmer.Joseph',
 'Palms.Richard',
 'Parker.Jonathan',
 'Partridge.Sam',
 'Payson.Joseph',
 'Pearce.IsaacJun',
 'Pearce.Isaac',
 'Peck.Thomas',
 'Peters.John',
 'Phillips.John',
 'Phillips.Samuel',
 'Phillips.William',
 'Pierce.William',
 'Pierpont.Robert',
 'Pitts.John',
 'Pitts.Lendall',
 'Pitts.Samuel',
 'Porter.Thomas',
 'Potter.Edward',
 'Prentiss.Henry',
 'Prince.John',
 'Prince.Job',
 'Pulling.Richard',
 'Purkitt.Henry',
 'Randall.John',
 'Roby.Joseph',
 'Roylson.Thomas',
 'Russell.John',
 'Russell.William',
 'Sessions.Robert',
 'Seward.James',
 'Sharp.Gibbens',
 'Shed.Joseph',
 'Sigourney.John',
 'Simpson.Benjamin',
 'Slater.Peter',
 'Sloper.Ambrose',
 'Smith.John',
 'Spear.Thomas',
 'Sprague.Samuel',
 'Spurr.John',
 'Stanbridge.Henry',
 'Starr.James',
 'Stearns.Phineas',
 'Stevens.Ebenezer',
 'Stoddard.Asa',
 'Stoddard.Jonathan',
 'Sweetser.John',
 'Symmes.Eben',
 'Symmes.John',
 'Tabor.Philip',
 'Tileston.Thomas',
 'Trott.George',
 'Tyler.Royall',
 'Vernon.Fortesque',
 'Waldo.Benjamin',
 'Webb.Joseph',
 'Webster.Thomas',
 'Wheeler.Josiah',
 'White.Samuel',
 'Whitten.John',
 'Whitwell.Samuel',
 'Whitwell.William',
 'Williams.Jeremiah',
 'Williams.Jonathan',
 'Williams.Thomas',
 'Willis.Nathaniel',
 'Wingfield.William',
 'Winslow.John',
 'Wyeth.Joshua']

Moral and Ethical Implications

Issues of privacy, agency, and ethical data collection: Even if we can collect it, should we collect it? What kind of data should remain private? How can we protect particular vulnerable populations? How can our data collection potentially harm vulnerable populations?

  • For example, researching a particular hashtag that is used by a vulnerable population to communicate with each other may make more visible a community that is relying on the hashtag; the more visibility this hashtag is, the more likely this vulnerable population will be exposed and potentially harassed/doxxed. Even if something is on the internet and is techinically "public," some information (such as specific hashtag) is less accessible for a reason

Issues of relational data: Our data is wrapped up in each other's data. Even if one person gives permission to use their data, you might be exposing more information about your friends that you realize. For example, there is an algoirthm that can identify whether or not someone is gay based on their friend's data. How far does the permission circle go? If I give permission, do my friends need to give permission, their friends, their friends' friends?

Reinforcing harmful biases: While using this data can help us better understand human behavior, we also have to understand that this data is reflective of human biases; if we are not careful when collecting, analyzing, and using this data, the data can wind up reinforcing particular power structures and social biases.

Machine Bias: There’s software used across the country to predict future criminals. And it’s biased against blacks.

  • This is the name of the investigative piece written by Pro Publica.
  • Scores — known as risk assessments — are increasingly common in courtrooms across the nation.
  • They are used to inform decisions about who can be set free at every stage of the criminal justice system, from assigning bond amounts — as is the case in Fort Lauderdale — to even more fundamental decisions about defendants’ freedom
  • The score proved remarkably unreliable in forecasting violent crime: Only 20 percent of the people predicted to commit violent crimes actually went on to do so
  • The formula was particularly likely to falsely flag black defendants as future criminals, wrongly labeling them this way at almost twice the rate as white defendants. White defendants were mislabeled as low risk more often than black defendants.
  • Difficult to construct a score that doesn’t include items that can be correlated with race — such as poverty, joblessness and social marginalization. “If those are omitted from your risk assessment, accuracy goes down



Wrap-Up Activity

Turn back to your partner from the beginning of the discussion. Now that you’ve learned some of the uses of computational social science and data analytics, how have your original thoughts changed? What else would you want to know? What questions would you ask if you had access to all of these data?

Want to learn more?

If you are interested in Computational Social Science, data analytics, ethical implications, and any of the topics we covered today, we encourage you to begin looking at potential courses or minors you might pursue!

  • Computational Social Science minor
  • Digital Minor
  • Combined major in Computer Science and CSSH
  • Other courses you might take: DS 2000/DS 20001 (Data Science)

Follow the NULab for workshops, events, potential courses, and more!

Our Contact Information

  • Cara Messina (messina.c@husky.neu.edu), NULab Coordinator: Office Hours: 409 Nightingale Hall, Tuesdays 12-1
  • Alexis Yohros (yohros.a@husky.neu.edu), Digital Teaching Integration RA