It is a dark time for the Star Wars fans. Although the franchise has been renewed, the plot holes in the new stories have driven the fans from their fantasy land and pursued them across the galaxy...
But a long time ago in a galaxy far, far away.... There were still fans of Star Wars, and this is a notebook about them.
While waiting for Star Wars: The Force Awakens to come out, the team at FiveThirtyEight became interested in answering some questions about Star Wars fans. They surveyed Star Wars fans and received 835 total responses, which you can download from their GitHub repository.
Dictionary:
RespondentID
- An anonymized ID for the respondent (person taking the survey)Gender
- The respondent's genderAge
- The respondent's ageHousehold Income
- The respondent's incomeEducation
- The respondent's education levelLocation (Census Region)
- The respondent's locationHave you seen any of the 6 films in the Star Wars franchise?
- Has a Yes
or No
responseDo you consider yourself to be a fan of the Star Wars film franchise?
- Has a Yes
or No
responseimport pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
star_wars = pd.read_csv("star_wars.csv", encoding="ISO-8859-1")
star_wars.head()
RespondentID | Have you seen any of the 6 films in the Star Wars franchise? | Do you consider yourself to be a fan of the Star Wars film franchise? | Which of the following Star Wars films have you seen? Please select all that apply. | Unnamed: 4 | Unnamed: 5 | Unnamed: 6 | Unnamed: 7 | Unnamed: 8 | Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film. | ... | Unnamed: 28 | Which character shot first? | Are you familiar with the Expanded Universe? | Do you consider yourself to be a fan of the Expanded Universe?Âæ | Do you consider yourself to be a fan of the Star Trek franchise? | Gender | Age | Household Income | Education | Location (Census Region) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | NaN | Response | Response | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | Star Wars: Episode I The Phantom Menace | ... | Yoda | Response | Response | Response | Response | Response | Response | Response | Response | Response |
1 | 3.292880e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 3 | ... | Very favorably | I don't understand this question | Yes | No | No | Male | 18-29 | NaN | High school degree | South Atlantic |
2 | 3.292880e+09 | No | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | Yes | Male | 18-29 | $0 - $24,999 | Bachelor degree | West South Central |
3 | 3.292765e+09 | Yes | No | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | NaN | NaN | NaN | 1 | ... | Unfamiliar (N/A) | I don't understand this question | No | NaN | No | Male | 18-29 | $0 - $24,999 | High school degree | West North Central |
4 | 3.292763e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 5 | ... | Very favorably | I don't understand this question | No | NaN | Yes | Male | 18-29 | $100,000 - $149,999 | Some college or Associate degree | West North Central |
5 rows × 38 columns
star_wars.columns
Index(['RespondentID', 'Have you seen any of the 6 films in the Star Wars franchise?', 'Do you consider yourself to be a fan of the Star Wars film franchise?', 'Which of the following Star Wars films have you seen? Please select all that apply.', 'Unnamed: 4', 'Unnamed: 5', 'Unnamed: 6', 'Unnamed: 7', 'Unnamed: 8', 'Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.', 'Unnamed: 10', 'Unnamed: 11', 'Unnamed: 12', 'Unnamed: 13', 'Unnamed: 14', 'Please state whether you view the following characters favorably, unfavorably, or are unfamiliar with him/her.', 'Unnamed: 16', 'Unnamed: 17', 'Unnamed: 18', 'Unnamed: 19', 'Unnamed: 20', 'Unnamed: 21', 'Unnamed: 22', 'Unnamed: 23', 'Unnamed: 24', 'Unnamed: 25', 'Unnamed: 26', 'Unnamed: 27', 'Unnamed: 28', 'Which character shot first?', 'Are you familiar with the Expanded Universe?', 'Do you consider yourself to be a fan of the Expanded Universe?Âæ', 'Do you consider yourself to be a fan of the Star Trek franchise?', 'Gender', 'Age', 'Household Income', 'Education', 'Location (Census Region)'], dtype='object')
first_row = pd.Series(star_wars.iloc[0])
first_row
RespondentID NaN Have you seen any of the 6 films in the Star Wars franchise? Response Do you consider yourself to be a fan of the Star Wars film franchise? Response Which of the following Star Wars films have you seen? Please select all that apply. Star Wars: Episode I The Phantom Menace Unnamed: 4 Star Wars: Episode II Attack of the Clones Unnamed: 5 Star Wars: Episode III Revenge of the Sith Unnamed: 6 Star Wars: Episode IV A New Hope Unnamed: 7 Star Wars: Episode V The Empire Strikes Back Unnamed: 8 Star Wars: Episode VI Return of the Jedi Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film. Star Wars: Episode I The Phantom Menace Unnamed: 10 Star Wars: Episode II Attack of the Clones Unnamed: 11 Star Wars: Episode III Revenge of the Sith Unnamed: 12 Star Wars: Episode IV A New Hope Unnamed: 13 Star Wars: Episode V The Empire Strikes Back Unnamed: 14 Star Wars: Episode VI Return of the Jedi Please state whether you view the following characters favorably, unfavorably, or are unfamiliar with him/her. Han Solo Unnamed: 16 Luke Skywalker Unnamed: 17 Princess Leia Organa Unnamed: 18 Anakin Skywalker Unnamed: 19 Obi Wan Kenobi Unnamed: 20 Emperor Palpatine Unnamed: 21 Darth Vader Unnamed: 22 Lando Calrissian Unnamed: 23 Boba Fett Unnamed: 24 C-3P0 Unnamed: 25 R2 D2 Unnamed: 26 Jar Jar Binks Unnamed: 27 Padme Amidala Unnamed: 28 Yoda Which character shot first? Response Are you familiar with the Expanded Universe? Response Do you consider yourself to be a fan of the Expanded Universe?Âæ Response Do you consider yourself to be a fan of the Star Trek franchise? Response Gender Response Age Response Household Income Response Education Response Location (Census Region) Response Name: 0, dtype: object
star_wars.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1187 entries, 0 to 1186 Data columns (total 38 columns): RespondentID 1186 non-null float64 Have you seen any of the 6 films in the Star Wars franchise? 1187 non-null object Do you consider yourself to be a fan of the Star Wars film franchise? 837 non-null object Which of the following Star Wars films have you seen? Please select all that apply. 674 non-null object Unnamed: 4 572 non-null object Unnamed: 5 551 non-null object Unnamed: 6 608 non-null object Unnamed: 7 759 non-null object Unnamed: 8 739 non-null object Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film. 836 non-null object Unnamed: 10 837 non-null object Unnamed: 11 836 non-null object Unnamed: 12 837 non-null object Unnamed: 13 837 non-null object Unnamed: 14 837 non-null object Please state whether you view the following characters favorably, unfavorably, or are unfamiliar with him/her. 830 non-null object Unnamed: 16 832 non-null object Unnamed: 17 832 non-null object Unnamed: 18 824 non-null object Unnamed: 19 826 non-null object Unnamed: 20 815 non-null object Unnamed: 21 827 non-null object Unnamed: 22 821 non-null object Unnamed: 23 813 non-null object Unnamed: 24 828 non-null object Unnamed: 25 831 non-null object Unnamed: 26 822 non-null object Unnamed: 27 815 non-null object Unnamed: 28 827 non-null object Which character shot first? 829 non-null object Are you familiar with the Expanded Universe? 829 non-null object Do you consider yourself to be a fan of the Expanded Universe?Âæ 214 non-null object Do you consider yourself to be a fan of the Star Trek franchise? 1069 non-null object Gender 1047 non-null object Age 1047 non-null object Household Income 859 non-null object Education 1037 non-null object Location (Census Region) 1044 non-null object dtypes: float64(1), object(37) memory usage: 352.5+ KB
star_wars.dropna(subset = ['RespondentID'], inplace = True)
star_wars['Have you seen any of the 6 films in the Star Wars franchise?'].value_counts(dropna = False)
Yes 936 No 250 Name: Have you seen any of the 6 films in the Star Wars franchise?, dtype: int64
star_wars['Do you consider yourself to be a fan of the Star Wars film franchise?'].value_counts(dropna = False)
Yes 552 NaN 350 No 284 Name: Do you consider yourself to be a fan of the Star Wars film franchise?, dtype: int64
# Map 'Yes' & 'No' answers to bools
yes_no_map = {'Yes':True, 'No':False}
star_wars['Have you seen any of the 6 films in the Star Wars franchise?'] = star_wars['Have you seen any of the 6 films in the Star Wars franchise?'].map(yes_no_map)
star_wars['Do you consider yourself to be a fan of the Star Wars film franchise?'] = star_wars['Do you consider yourself to be a fan of the Star Wars film franchise?'].map(yes_no_map)
# Map movie titles to bools
for i in range(3, 9):
ansr_map = {first_row[i]:True,
np.nan:False}
star_wars.iloc[:,i] = star_wars.iloc[:,i].map(ansr_map)
# Rename columns
cols = star_wars.columns[3:9]
for i in range(6):
star_wars.rename(columns = {cols[i]:'seen_{}'.format(i+1)} ,inplace = True)
# Convert column values to type float
star_wars.iloc[:,9:15] = star_wars.iloc[:,9:15].astype(float)
rank_cols = star_wars.columns[9:15]
for i in range(6):
star_wars.rename(columns = {rank_cols[i]:'ranking_{}'.format(i+1)} ,inplace = True)
star_wars.head()
RespondentID | Have you seen any of the 6 films in the Star Wars franchise? | Do you consider yourself to be a fan of the Star Wars film franchise? | seen_1 | seen_2 | seen_3 | seen_4 | seen_5 | seen_6 | ranking_1 | ... | Unnamed: 28 | Which character shot first? | Are you familiar with the Expanded Universe? | Do you consider yourself to be a fan of the Expanded Universe?Âæ | Do you consider yourself to be a fan of the Star Trek franchise? | Gender | Age | Household Income | Education | Location (Census Region) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 3.292880e+09 | True | True | True | True | True | True | True | True | 3.0 | ... | Very favorably | I don't understand this question | Yes | No | No | Male | 18-29 | NaN | High school degree | South Atlantic |
2 | 3.292880e+09 | False | NaN | False | False | False | False | False | False | NaN | ... | NaN | NaN | NaN | NaN | Yes | Male | 18-29 | $0 - $24,999 | Bachelor degree | West South Central |
3 | 3.292765e+09 | True | False | True | True | True | False | False | False | 1.0 | ... | Unfamiliar (N/A) | I don't understand this question | No | NaN | No | Male | 18-29 | $0 - $24,999 | High school degree | West North Central |
4 | 3.292763e+09 | True | True | True | True | True | True | True | True | 5.0 | ... | Very favorably | I don't understand this question | No | NaN | Yes | Male | 18-29 | $100,000 - $149,999 | Some college or Associate degree | West North Central |
5 | 3.292731e+09 | True | True | True | True | True | True | True | True | 5.0 | ... | Somewhat favorably | Greedo | Yes | No | No | Male | 18-29 | $100,000 - $149,999 | Some college or Associate degree | West North Central |
5 rows × 38 columns
ranks = star_wars.iloc[:,9:15]
rank_mean = ranks.mean()
rank_mean
ranking_1 3.732934 ranking_2 4.087321 ranking_3 4.341317 ranking_4 3.272727 ranking_5 2.513158 ranking_6 3.047847 dtype: float64
# Distribution of top ranked among all
ranks['ranking_1'].hist(color = 'yellowgreen')
<matplotlib.axes._subplots.AxesSubplot at 0x7f84815a80f0>
# Distribution of lowest ranked among all
ranks['ranking_6'].hist(color = 'coral')
<matplotlib.axes._subplots.AxesSubplot at 0x7f848154a908>
# Number of people that have seen each movie
star_wars.iloc[:,3:9].sum().plot.bar()
<matplotlib.axes._subplots.AxesSubplot at 0x7f8481451be0>
male = star_wars[star_wars['Gender'] == 'Male']
female = star_wars[star_wars['Gender'] == 'Female']
# Most viewed movie by men & women
fig = plt.figure() # Create matplotlib figure
ax = fig.add_subplot(1,1,1) # Create matplotlib axes
ax2 = ax.twinx() # Create another axes that shares the same x-axis as ax.
male.iloc[:,3:9].sum().plot(kind='bar', color= 'coral', ax=ax, width=0.2, position=0, label = 'male', rot = 0)
female.iloc[:,3:9].sum().plot(kind='bar', color= 'yellowgreen', ax=ax2, width=0.2, position=1, label = 'femal')
ax.legend(bbox_to_anchor=(1.4, 0.6))
ax2.legend(bbox_to_anchor=(1.4, 0.4))
<matplotlib.legend.Legend at 0x7f848137f6a0>
# Rankings among men & women
male.iloc[:,9:15].mean().plot.bar(title = 'Star Wars male ranking', color = 'coral')
<matplotlib.axes._subplots.AxesSubplot at 0x7f8481585940>
female.iloc[:,9:15].mean().plot.bar(title = 'Star Wars female ranking', color = 'yellowgreen')
<matplotlib.axes._subplots.AxesSubplot at 0x7f848126ec88>
# Distribution of top ranked among all by men
male[['ranking_1', 'ranking_2','ranking_3','ranking_6']].hist(figsize = (15,10), color = 'coral')
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f84812276a0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7f84811c5d30>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7f84811166d8>, <matplotlib.axes._subplots.AxesSubplot object at 0x7f84810d0550>]], dtype=object)
# Distribution of top ranked among all by women
female[['ranking_1', 'ranking_2','ranking_3','ranking_6']].hist(figsize = (15,10), color = 'yellowgreen')
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f848105c6d8>, <matplotlib.axes._subplots.AxesSubplot object at 0x7f8480fdfcf8>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7f8480fae080>, <matplotlib.axes._subplots.AxesSubplot object at 0x7f8480f658d0>]], dtype=object)