While waiting for star wars awaken, FiveThirtyEight became interested in answering some questions about star wars fans. e.g. Does the rest of America realize the "The empire Strikes Back" is clearly the best of the bounch? The team collected 835 responses using SurveyMonkey which can be downloaded through https://github.com/fivethirtyeight/data/tree/master/star-wars-survey.
This project aims to analyze the best movie watched by fans in star wars movies. In addition, we will clean and explore the data set to have right conclusions about fans'favourite movie in star wars movies.
# import all the necessary libararies to allow for proper coding and graph plotting.
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
# Read the data set
star_wars = pd.read_csv("star_wars.csv", encoding="ISO-8859-1")
# Explore data set
star_wars.head(10)
RespondentID | Have you seen any of the 6 films in the Star Wars franchise? | Do you consider yourself to be a fan of the Star Wars film franchise? | Which of the following Star Wars films have you seen? Please select all that apply. | Unnamed: 4 | Unnamed: 5 | Unnamed: 6 | Unnamed: 7 | Unnamed: 8 | Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film. | ... | Unnamed: 28 | Which character shot first? | Are you familiar with the Expanded Universe? | Do you consider yourself to be a fan of the Expanded Universe?Âæ | Do you consider yourself to be a fan of the Star Trek franchise? | Gender | Age | Household Income | Education | Location (Census Region) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | NaN | Response | Response | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | Star Wars: Episode I The Phantom Menace | ... | Yoda | Response | Response | Response | Response | Response | Response | Response | Response | Response |
1 | 3.292880e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 3 | ... | Very favorably | I don't understand this question | Yes | No | No | Male | 18-29 | NaN | High school degree | South Atlantic |
2 | 3.292880e+09 | No | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | Yes | Male | 18-29 | $0 - $24,999 | Bachelor degree | West South Central |
3 | 3.292765e+09 | Yes | No | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | NaN | NaN | NaN | 1 | ... | Unfamiliar (N/A) | I don't understand this question | No | NaN | No | Male | 18-29 | $0 - $24,999 | High school degree | West North Central |
4 | 3.292763e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 5 | ... | Very favorably | I don't understand this question | No | NaN | Yes | Male | 18-29 | $100,000 - $149,999 | Some college or Associate degree | West North Central |
5 | 3.292731e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 5 | ... | Somewhat favorably | Greedo | Yes | No | No | Male | 18-29 | $100,000 - $149,999 | Some college or Associate degree | West North Central |
6 | 3.292719e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 1 | ... | Very favorably | Han | Yes | No | Yes | Male | 18-29 | $25,000 - $49,999 | Bachelor degree | Middle Atlantic |
7 | 3.292685e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 6 | ... | Very favorably | Han | Yes | No | No | Male | 18-29 | NaN | High school degree | East North Central |
8 | 3.292664e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 4 | ... | Very favorably | Han | No | NaN | Yes | Male | 18-29 | NaN | High school degree | South Atlantic |
9 | 3.292654e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 5 | ... | Somewhat favorably | Han | No | NaN | No | Male | 18-29 | $0 - $24,999 | Some college or Associate degree | South Atlantic |
10 rows × 38 columns
# review column names
star_wars.columns
Index(['RespondentID', 'Have you seen any of the 6 films in the Star Wars franchise?', 'Do you consider yourself to be a fan of the Star Wars film franchise?', 'Which of the following Star Wars films have you seen? Please select all that apply.', 'Unnamed: 4', 'Unnamed: 5', 'Unnamed: 6', 'Unnamed: 7', 'Unnamed: 8', 'Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.', 'Unnamed: 10', 'Unnamed: 11', 'Unnamed: 12', 'Unnamed: 13', 'Unnamed: 14', 'Please state whether you view the following characters favorably, unfavorably, or are unfamiliar with him/her.', 'Unnamed: 16', 'Unnamed: 17', 'Unnamed: 18', 'Unnamed: 19', 'Unnamed: 20', 'Unnamed: 21', 'Unnamed: 22', 'Unnamed: 23', 'Unnamed: 24', 'Unnamed: 25', 'Unnamed: 26', 'Unnamed: 27', 'Unnamed: 28', 'Which character shot first?', 'Are you familiar with the Expanded Universe?', 'Do you consider yourself to be a fan of the Expanded Universe?Âæ', 'Do you consider yourself to be a fan of the Star Trek franchise?', 'Gender', 'Age', 'Household Income', 'Education', 'Location (Census Region)'], dtype='object')
Preliminery Observation:
# rename non encoded character
star_wars = star_wars.rename(columns = {'Do you consider yourself to be a fan of the Expanded Universe?Âæ': 'Do you consider yourself to be a fan of the Expanded Universe?'})
star_wars.columns
Index(['RespondentID', 'Have you seen any of the 6 films in the Star Wars franchise?', 'Do you consider yourself to be a fan of the Star Wars film franchise?', 'Which of the following Star Wars films have you seen? Please select all that apply.', 'Unnamed: 4', 'Unnamed: 5', 'Unnamed: 6', 'Unnamed: 7', 'Unnamed: 8', 'Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.', 'Unnamed: 10', 'Unnamed: 11', 'Unnamed: 12', 'Unnamed: 13', 'Unnamed: 14', 'Please state whether you view the following characters favorably, unfavorably, or are unfamiliar with him/her.', 'Unnamed: 16', 'Unnamed: 17', 'Unnamed: 18', 'Unnamed: 19', 'Unnamed: 20', 'Unnamed: 21', 'Unnamed: 22', 'Unnamed: 23', 'Unnamed: 24', 'Unnamed: 25', 'Unnamed: 26', 'Unnamed: 27', 'Unnamed: 28', 'Which character shot first?', 'Are you familiar with the Expanded Universe?', 'Do you consider yourself to be a fan of the Expanded Universe?', 'Do you consider yourself to be a fan of the Star Trek franchise?', 'Gender', 'Age', 'Household Income', 'Education', 'Location (Census Region)'], dtype='object')
# Explore the num of rows and columns to have a clear picture of our dataset
star_wars.shape
(1187, 38)
# Lets check if RespondentID is truely empty
star_wars.iloc[0:,]
RespondentID | Have you seen any of the 6 films in the Star Wars franchise? | Do you consider yourself to be a fan of the Star Wars film franchise? | Which of the following Star Wars films have you seen? Please select all that apply. | Unnamed: 4 | Unnamed: 5 | Unnamed: 6 | Unnamed: 7 | Unnamed: 8 | Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film. | ... | Unnamed: 28 | Which character shot first? | Are you familiar with the Expanded Universe? | Do you consider yourself to be a fan of the Expanded Universe? | Do you consider yourself to be a fan of the Star Trek franchise? | Gender | Age | Household Income | Education | Location (Census Region) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | NaN | Response | Response | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | Star Wars: Episode I The Phantom Menace | ... | Yoda | Response | Response | Response | Response | Response | Response | Response | Response | Response |
1 | 3.292880e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 3 | ... | Very favorably | I don't understand this question | Yes | No | No | Male | 18-29 | NaN | High school degree | South Atlantic |
2 | 3.292880e+09 | No | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | Yes | Male | 18-29 | $0 - $24,999 | Bachelor degree | West South Central |
3 | 3.292765e+09 | Yes | No | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | NaN | NaN | NaN | 1 | ... | Unfamiliar (N/A) | I don't understand this question | No | NaN | No | Male | 18-29 | $0 - $24,999 | High school degree | West North Central |
4 | 3.292763e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 5 | ... | Very favorably | I don't understand this question | No | NaN | Yes | Male | 18-29 | $100,000 - $149,999 | Some college or Associate degree | West North Central |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1182 | 3.288389e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 5 | ... | Very favorably | Han | No | NaN | Yes | Female | 18-29 | $0 - $24,999 | Some college or Associate degree | East North Central |
1183 | 3.288379e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 4 | ... | Very favorably | I don't understand this question | No | NaN | Yes | Female | 30-44 | $50,000 - $99,999 | Bachelor degree | Mountain |
1184 | 3.288375e+09 | No | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | No | Female | 30-44 | $50,000 - $99,999 | Bachelor degree | Middle Atlantic |
1185 | 3.288373e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 4 | ... | Very favorably | Han | No | NaN | Yes | Female | 45-60 | $100,000 - $149,999 | Some college or Associate degree | East North Central |
1186 | 3.288373e+09 | Yes | No | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | NaN | NaN | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 6 | ... | Very unfavorably | I don't understand this question | No | NaN | No | Female | > 60 | $50,000 - $99,999 | Graduate degree | Pacific |
1187 rows × 38 columns
# Since the RespondentID isnt empty, we will only remove missing data within the row and set the RespondentID to frame as below
column_name_new = star_wars.iloc[0, :].to_frame()
# Removing invalid rows with a missing 'RespondentID'
star_wars = star_wars[star_wars['RespondentID'].notnull()].reset_index(drop=True)
star_wars.shape
(1186, 38)
Its clear now that we have remove only one row from the dataset i.e.from 1187 rows to 1186rows
#Create a dictionary for Yes/No columns
yes_no = {'Yes': True,
'No': False
}
#convert 'Have you seen any of the 6 films in the Star Wars franchise?' column to Boolean type
star_wars.iloc[:,1] = star_wars.iloc[:,1].map(yes_no)
#convert 'Do you consider yourself to be a fan of the Star Wars film franchise?' column to Boolean type
star_wars.iloc[:,2] = star_wars.iloc[:,2].map(yes_no)
# check that all values in Have you seen any of the 6 films in the Star Wars franchise? have boolean type
print(star_wars.iloc[:,1].value_counts(dropna=False))
True 936 False 250 Name: Have you seen any of the 6 films in the Star Wars franchise?, dtype: int64
#check all values in Do you consider yourself to be a fan of the Star Wars film franchise? have boolean type
print(star_wars.iloc[:,2].value_counts(dropna=False))
True 552 NaN 350 False 284 Name: Do you consider yourself to be a fan of the Star Wars film franchise?, dtype: int64
# To re-check that boolean type is properly displayed in above column names
star_wars.head(3)
RespondentID | Have you seen any of the 6 films in the Star Wars franchise? | Do you consider yourself to be a fan of the Star Wars film franchise? | Which of the following Star Wars films have you seen? Please select all that apply. | Unnamed: 4 | Unnamed: 5 | Unnamed: 6 | Unnamed: 7 | Unnamed: 8 | Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film. | ... | Unnamed: 28 | Which character shot first? | Are you familiar with the Expanded Universe? | Do you consider yourself to be a fan of the Expanded Universe? | Do you consider yourself to be a fan of the Star Trek franchise? | Gender | Age | Household Income | Education | Location (Census Region) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 3.292880e+09 | True | True | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 3 | ... | Very favorably | I don't understand this question | Yes | No | No | Male | 18-29 | NaN | High school degree | South Atlantic |
1 | 3.292880e+09 | False | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | Yes | Male | 18-29 | $0 - $24,999 | Bachelor degree | West South Central |
2 | 3.292765e+09 | True | False | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | NaN | NaN | NaN | 1 | ... | Unfamiliar (N/A) | I don't understand this question | No | NaN | No | Male | 18-29 | $0 - $24,999 | High school degree | West North Central |
3 rows × 38 columns
We would like to convert columns to Boolean type. To do this, we first create an empty list to store all six columns, create a dictionary, then apply map() method to convert to boolean type
#Create empty list to store all six episodes and avoid mistakes since there are spaces between words
episodes = []
for col in star_wars.columns[3:9]:
episodes.append(star_wars[col].unique()[0])
print(episodes)
['Star Wars: Episode I The Phantom Menace', 'Star Wars: Episode II Attack of the Clones', 'Star Wars: Episode III Revenge of the Sith', 'Star Wars: Episode IV A New Hope', 'Star Wars: Episode V The Empire Strikes Back', 'Star Wars: Episode VI Return of the Jedi']
# create empty dictionary and loop over the columns
episode_yes_no = {}
for episode in episodes:
# set values found in list to True
episode_yes_no[episode] = True
#set episode not found in list to False
episode_yes_no[np.NaN] = False
episode_yes_no
{'Star Wars: Episode I The Phantom Menace': True, 'Star Wars: Episode II Attack of the Clones': True, 'Star Wars: Episode III Revenge of the Sith': True, 'Star Wars: Episode IV A New Hope': True, 'Star Wars: Episode V The Empire Strikes Back': True, 'Star Wars: Episode VI Return of the Jedi': True, nan: False}
#Convert all the six columns to boolean type using map()method
for col in star_wars.columns[3:9]:
star_wars[col] =star_wars[col].map(episode_yes_no)
for col in star_wars.columns[3:9]:
print(star_wars[col].value_counts(dropna=False))
True 673 False 513 Name: Which of the following Star Wars films have you seen? Please select all that apply., dtype: int64 False 615 True 571 Name: Unnamed: 4, dtype: int64 False 636 True 550 Name: Unnamed: 5, dtype: int64 True 607 False 579 Name: Unnamed: 6, dtype: int64 True 758 False 428 Name: Unnamed: 7, dtype: int64 True 738 False 448 Name: Unnamed: 8, dtype: int64
star_wars.head(5) # To re-confirm that the columns are set to boolean
RespondentID | Have you seen any of the 6 films in the Star Wars franchise? | Do you consider yourself to be a fan of the Star Wars film franchise? | Which of the following Star Wars films have you seen? Please select all that apply. | Unnamed: 4 | Unnamed: 5 | Unnamed: 6 | Unnamed: 7 | Unnamed: 8 | Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film. | ... | Unnamed: 28 | Which character shot first? | Are you familiar with the Expanded Universe? | Do you consider yourself to be a fan of the Expanded Universe? | Do you consider yourself to be a fan of the Star Trek franchise? | Gender | Age | Household Income | Education | Location (Census Region) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 3.292880e+09 | True | True | True | True | True | True | True | True | 3 | ... | Very favorably | I don't understand this question | Yes | No | No | Male | 18-29 | NaN | High school degree | South Atlantic |
1 | 3.292880e+09 | False | NaN | False | False | False | False | False | False | NaN | ... | NaN | NaN | NaN | NaN | Yes | Male | 18-29 | $0 - $24,999 | Bachelor degree | West South Central |
2 | 3.292765e+09 | True | False | True | True | True | False | False | False | 1 | ... | Unfamiliar (N/A) | I don't understand this question | No | NaN | No | Male | 18-29 | $0 - $24,999 | High school degree | West North Central |
3 | 3.292763e+09 | True | True | True | True | True | True | True | True | 5 | ... | Very favorably | I don't understand this question | No | NaN | Yes | Male | 18-29 | $100,000 - $149,999 | Some college or Associate degree | West North Central |
4 | 3.292731e+09 | True | True | True | True | True | True | True | True | 5 | ... | Somewhat favorably | Greedo | Yes | No | No | Male | 18-29 | $100,000 - $149,999 | Some college or Associate degree | West North Central |
5 rows × 38 columns
Now it can be seen that, all values found in the list (episodes) is set to True and Nan to False
# Rename columns using dictionary and dataframe.rename() method
star_wars = star_wars.rename(columns = { 'Which of the following Star Wars films have you seen? Please select all that apply.':
'seen_1',
'Unnamed: 4' : 'seen_2',
'Unnamed: 5': 'seen_3',
'Unnamed: 6': 'seen_4',
'Unnamed: 7': 'seen_5',
'Unnamed: 8': 'seen_6'
,})
star_wars.columns # to see if columns names are properly changed
Index(['RespondentID', 'Have you seen any of the 6 films in the Star Wars franchise?', 'Do you consider yourself to be a fan of the Star Wars film franchise?', 'seen_1', 'seen_2', 'seen_3', 'seen_4', 'seen_5', 'seen_6', 'Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.', 'Unnamed: 10', 'Unnamed: 11', 'Unnamed: 12', 'Unnamed: 13', 'Unnamed: 14', 'Please state whether you view the following characters favorably, unfavorably, or are unfamiliar with him/her.', 'Unnamed: 16', 'Unnamed: 17', 'Unnamed: 18', 'Unnamed: 19', 'Unnamed: 20', 'Unnamed: 21', 'Unnamed: 22', 'Unnamed: 23', 'Unnamed: 24', 'Unnamed: 25', 'Unnamed: 26', 'Unnamed: 27', 'Unnamed: 28', 'Which character shot first?', 'Are you familiar with the Expanded Universe?', 'Do you consider yourself to be a fan of the Expanded Universe?', 'Do you consider yourself to be a fan of the Star Trek franchise?', 'Gender', 'Age', 'Household Income', 'Education', 'Location (Census Region)'], dtype='object')
column_name_new[9:15]
0 | |
---|---|
Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film. | Star Wars: Episode I The Phantom Menace |
Unnamed: 10 | Star Wars: Episode II Attack of the Clones |
Unnamed: 11 | Star Wars: Episode III Revenge of the Sith |
Unnamed: 12 | Star Wars: Episode IV A New Hope |
Unnamed: 13 | Star Wars: Episode V The Empire Strikes Back |
Unnamed: 14 | Star Wars: Episode VI Return of the Jedi |
# To convert each columns to numeric typee.e g float
star_wars[star_wars.columns[9:15]] = star_wars[star_wars.columns[9:15]].astype(float)
# The unmaned columns seems clean, thus a more descriptive name such as ranking_1 is used to rename the unnamed columns
star_wars = star_wars.rename(columns={
'Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.' : 'ranking_1',
'Unnamed: 10' : 'ranking_2',
'Unnamed: 11' : 'ranking_3',
'Unnamed: 12' : 'ranking_4',
'Unnamed: 13' : 'ranking_5',
'Unnamed: 14' : 'ranking_6'
})
star_wars.head()
RespondentID | Have you seen any of the 6 films in the Star Wars franchise? | Do you consider yourself to be a fan of the Star Wars film franchise? | seen_1 | seen_2 | seen_3 | seen_4 | seen_5 | seen_6 | ranking_1 | ... | Unnamed: 28 | Which character shot first? | Are you familiar with the Expanded Universe? | Do you consider yourself to be a fan of the Expanded Universe? | Do you consider yourself to be a fan of the Star Trek franchise? | Gender | Age | Household Income | Education | Location (Census Region) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 3.292880e+09 | True | True | True | True | True | True | True | True | 3.0 | ... | Very favorably | I don't understand this question | Yes | No | No | Male | 18-29 | NaN | High school degree | South Atlantic |
1 | 3.292880e+09 | False | NaN | False | False | False | False | False | False | NaN | ... | NaN | NaN | NaN | NaN | Yes | Male | 18-29 | $0 - $24,999 | Bachelor degree | West South Central |
2 | 3.292765e+09 | True | False | True | True | True | False | False | False | 1.0 | ... | Unfamiliar (N/A) | I don't understand this question | No | NaN | No | Male | 18-29 | $0 - $24,999 | High school degree | West North Central |
3 | 3.292763e+09 | True | True | True | True | True | True | True | True | 5.0 | ... | Very favorably | I don't understand this question | No | NaN | Yes | Male | 18-29 | $100,000 - $149,999 | Some college or Associate degree | West North Central |
4 | 3.292731e+09 | True | True | True | True | True | True | True | True | 5.0 | ... | Somewhat favorably | Greedo | Yes | No | No | Male | 18-29 | $100,000 - $149,999 | Some college or Associate degree | West North Central |
5 rows × 38 columns
# Calculating the highest ranked film using mean() of ranking columns
episode_average_ranking= star_wars[star_wars.columns[9:15]].mean()
%matplotlib inline
ax = episode_average_ranking.plot(kind='bar', rot = 90)
ax.set_ylim(0,5)
ax.set_title("Star Wars films in order of preference", fontsize=12)
ax.set_ylabel("Average Viewer Preference", fontsize=10)
ax.tick_params(bottom=False, left=False, labelleft=False)
sns.despine(bottom= True)
x_offset = -0.18
y_offset = 0.06
for p in ax.patches:
b = p.get_bbox()
val = "{:.2f}".format(b.y1 + b.y0)
ax.annotate(val, ((b.x0 + b.x1)/2 + x_offset, b.y1 + y_offset))
sns.set_style("darkgrid")
plt.show()
Based on bar plot above, the highest average score is about 2.51, received ranking_5. That is, "star wars:Episode V The Empire Strikes Back" Films: Star Wars: "Episode IV A New Hope and Star Wars: Episode VI Return of the Jedi got also high marks - (both around 3.0)". Films: Episode I The Phantom Menace and Episode II Attack of the Clones are rated low whereas Episode III Revenge of the Sith have been the lowest-rated. Why respondents ranked the movies this ? In general, Episodes IV, V, and VI were release earier than the other 3 episodes. Movies such as episode III released in 2005 got the worst ranking. In conclusion, respondents tend to have high fondness for older episodes than new release.
# Calculating how many people have watched each movie using df.sum()method
sum_seen = star_wars[star_wars.columns[3:9]].sum().reset_index()
sum_seen
index | 0 | |
---|---|---|
0 | seen_1 | 673 |
1 | seen_2 | 571 |
2 | seen_3 | 550 |
3 | seen_4 | 607 |
4 | seen_5 | 758 |
5 | seen_6 | 738 |
#sum all rows in seen columns
movie_views = star_wars[star_wars.columns[3:9]].sum()
#create bar plot
movie_views.plot.bar(rot=45)
<matplotlib.axes._subplots.AxesSubplot at 0x7f984ff1eaf0>
The most viewed Episode is: Star wars:Episode V The Empire Strikes Back"(758 viewers, seen_5) followed by Star Wars: Episode VI Return of the Jedi(738 viewers, seen_6). It can be seen that, there is correlation between number of viewers and ranking of movies. However, there is change in the trend , for instance, seen_1(673 viewers) is also popular Episode though it is new release. Therefore, in my opinion, I would suggest that, viewers started enjoying the early episodes but lost track in following the next episodes because it took long for new episodes to be release. It could also be that, they watched the old episodes and got satisfied and it wasn't worth folowing the new release. However, their interest picked-up again in seen_1
# Select the data with Male only
males_gender = star_wars[star_wars["Gender"] == "Male"]
# get the avg ranking for male
episode_average_ranking_male =males_gender[males_gender.columns[9:15]].mean()
# count of viewers by males
movie_views_male = males_gender[males_gender.columns[3:9]].sum()
print(movie_views_male)
seen_1 361 seen_2 323 seen_3 317 seen_4 342 seen_5 392 seen_6 387 dtype: int64
# Select the data with female only
females_gender = star_wars[star_wars["Gender"] == "Female"]
# get the avg ranking for female
episode_average_ranking_female = females_gender[females_gender.columns[9:15]].mean()
# counts of viewers by females
movie_view_female = females_gender[females_gender.columns[3:9]].sum()
print(movie_view_female)
seen_1 298 seen_2 237 seen_3 222 seen_4 255 seen_5 353 seen_6 338 dtype: int64
# Combine count for males-gender and females_gender average views
males_vs_females_ranking = (pd.concat([episode_average_ranking_male, episode_average_ranking_female, episode_average_ranking], axis=1).
rename({0 : 'Males_Avg', 1: 'Females_Avg', 2 :'Total'}, axis=1)) #rename column names from default names
#creating bar plot
males_vs_females_ranking.plot.bar(rot=45)
plt.title('Males_Avg vs Females_Avg with average Total ranking score')
plt.show()
Based on bar plot above, there isnt any significant patterns, only that, Episode IV, V, VI are best ranked than others
# combine male_gender and female_gender views
males_vs_females_views = (pd.concat([movie_views_male,movie_view_female,movie_views],axis=1).
rename({0: 'Male_Views', 1: 'Female_Views', 2: 'Total'}, axis = 1))
# creating bar plot
males_vs_females_views.plot.bar(rot=45)
plt.title('Male_Views vs Female_Views')
plt.show()
From bar plot of Male_Views vs Female_views, Male viewers seem to dominate in all star wars episodes. The higest number of views are in episode V, VI and IV respectively.Moreover, there are also high number of male_viewers in episode I.
# Segment Data based on education Column
# First check value counts in Education column using df['colunm'].value_counts()
star_wars['Education'].value_counts(dropna = False)
Some college or Associate degree 328 Bachelor degree 321 Graduate degree 275 NaN 150 High school degree 105 Less than high school degree 7 Name: Education, dtype: int64
#create dictionary for NaN values and replacing it with unknown
nan = {np.nan: 'Unknown'}
star_wars['Education'] = star_wars['Education'].replace(nan)
print(star_wars['Education'].value_counts(dropna = False))
Some college or Associate degree 328 Bachelor degree 321 Graduate degree 275 Unknown 150 High school degree 105 Less than high school degree 7 Name: Education, dtype: int64
# Generate a plot with Education levels
education = pd.pivot_table(star_wars, index = 'Education',
values = 'Have you seen any of the 6 films in the Star Wars franchise?',
aggfunc = np.sum # aggregate based on sum
)
education.plot.barh(legend = False)
plt.title('Epiosode of Star Wars watched per education level')
plt.show()
Based on Education level bar plot above, its clear most people who watched star wars movies are also educated. That is, people with college degree, graduate and Bachelor degrees watched most of star wars movies than those with less than high school degree
# Segment Data based on household income Column
# First check value counts in household income column using df['colunm'].value_counts()
star_wars['Household Income'].value_counts(dropna = False)
NaN 328 $50,000 - $99,999 298 $25,000 - $49,999 186 $100,000 - $149,999 141 $0 - $24,999 138 $150,000+ 95 Name: Household Income, dtype: int64
# segment data based on household Income
# Generate plot as per house hold income
income_per_house = pd.pivot_table(star_wars, index = ['Household Income'],
values = ['Have you seen any of the 6 films in the Star Wars franchise?'],
aggfunc = np.sum # aggregate based on mean
)
income_per_house.plot.barh(legend = False)
plt.title('Epiosode of Star Wars watched per Income level')
plt.show()
It can be seen that, people with high income didnt watch most of star wars movies. For instance, house hold with more than $150,000+ spend less time in watching movies than moderate earners
# First check value counts in location column using df['colunm'].value_counts()
star_wars['Location (Census Region)'].value_counts(dropna = False)
East North Central 181 Pacific 175 South Atlantic 170 NaN 143 Middle Atlantic 122 West South Central 110 West North Central 93 Mountain 79 New England 75 East South Central 38 Name: Location (Census Region), dtype: int64
#create dictionary for NaN values and replacing it with unknown
nan = {np.nan: 'Unknown'}
star_wars['Location (Census Region)'] = star_wars['Location (Census Region)'].replace(nan)
print(star_wars['Location (Census Region)'].value_counts(dropna = False))
East North Central 181 Pacific 175 South Atlantic 170 Unknown 143 Middle Atlantic 122 West South Central 110 West North Central 93 Mountain 79 New England 75 East South Central 38 Name: Location (Census Region), dtype: int64
# Generate a plot with Education levels
location = pd.pivot_table(star_wars, index = 'Location (Census Region)',
values = 'Have you seen any of the 6 films in the Star Wars franchise?',
aggfunc = np.sum # aggregate based on sum
)
location.plot.barh(legend = False)
plt.title('Epiosode of Star Wars watched per USA Region')
plt.show()
In the USA, South Atlantic, Pacific and East North Central are the region that have watched most of the star wars movies than other regions.
#First check value counts in age column using df['colunm'].value_counts()
star_wars['Age'].value_counts(dropna = False)
45-60 291 > 60 269 30-44 268 18-29 218 NaN 140 Name: Age, dtype: int64
# group the data and get mean
age_group = star_wars.groupby(['Age']).mean()
age_group = age_group[age_group.columns[8:15]]
# generate a bar plot
age_group.plot.bar(rot = 45).legend(loc='center left',bbox_to_anchor=(1.0, 0.5)) #set legend outside the plot
plt.title('Epiosode of Star Wars Ranked per Age Group')
plt.show()
Its clear that most people who ranked high in Star Wars movies are between 45-60 years than younger ages. Its so because star wars movies started long time back and probably older people have better taste for it than younger people
#select columns age and seens
age_group_views = star_wars[['Age', 'seen_1', 'seen_2', 'seen_3', 'seen_4', 'seen_5', 'seen_6']]
#grop by age and count sum
age_group_views = age_group_views.groupby(['Age']).sum()
#create bar plot
age_group_views.plot.bar(rot=45).legend(loc='center left',bbox_to_anchor=(1.0, 0.5))
plt.title('Epiosode of Star Wars Viewed per Age Group')
plt.show()
Similarly to ranking, mojarity of people who have seen star wars movies are between 45-60 years and 30-44years. In addition, it is clear episode V is most watched than other episodes
# First display column[15:29] to show all the characters within selected columns
star_wars[star_wars.columns[15:29]]
Please state whether you view the following characters favorably, unfavorably, or are unfamiliar with him/her. | Unnamed: 16 | Unnamed: 17 | Unnamed: 18 | Unnamed: 19 | Unnamed: 20 | Unnamed: 21 | Unnamed: 22 | Unnamed: 23 | Unnamed: 24 | Unnamed: 25 | Unnamed: 26 | Unnamed: 27 | Unnamed: 28 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Very favorably | Very favorably | Very favorably | Very favorably | Very favorably | Very favorably | Very favorably | Unfamiliar (N/A) | Unfamiliar (N/A) | Very favorably | Very favorably | Very favorably | Very favorably | Very favorably |
1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | Somewhat favorably | Somewhat favorably | Somewhat favorably | Somewhat favorably | Somewhat favorably | Unfamiliar (N/A) | Unfamiliar (N/A) | Unfamiliar (N/A) | Unfamiliar (N/A) | Unfamiliar (N/A) | Unfamiliar (N/A) | Unfamiliar (N/A) | Unfamiliar (N/A) | Unfamiliar (N/A) |
3 | Very favorably | Very favorably | Very favorably | Very favorably | Very favorably | Somewhat favorably | Very favorably | Somewhat favorably | Somewhat unfavorably | Very favorably | Very favorably | Very favorably | Very favorably | Very favorably |
4 | Very favorably | Somewhat favorably | Somewhat favorably | Somewhat unfavorably | Very favorably | Very unfavorably | Somewhat favorably | Neither favorably nor unfavorably (neutral) | Very favorably | Somewhat favorably | Somewhat favorably | Very unfavorably | Somewhat favorably | Somewhat favorably |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1181 | Very favorably | Somewhat favorably | Somewhat favorably | Somewhat favorably | Very favorably | Somewhat favorably | Somewhat favorably | Somewhat favorably | Somewhat favorably | Very favorably | Very favorably | Somewhat favorably | Somewhat favorably | Very favorably |
1182 | Very favorably | Somewhat favorably | Very favorably | Somewhat unfavorably | Very favorably | Neither favorably nor unfavorably (neutral) | Very unfavorably | Somewhat favorably | Unfamiliar (N/A) | Somewhat favorably | Very favorably | Somewhat unfavorably | Somewhat unfavorably | Very favorably |
1183 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1184 | Very favorably | Neither favorably nor unfavorably (neutral) | Very favorably | Very favorably | Very favorably | Neither favorably nor unfavorably (neutral) | Very favorably | Somewhat favorably | Very favorably | Somewhat favorably | Somewhat favorably | Very favorably | Somewhat favorably | Very favorably |
1185 | Very favorably | Very favorably | Very favorably | Very unfavorably | Very favorably | Very unfavorably | Very favorably | Very unfavorably | Unfamiliar (N/A) | Somewhat favorably | Somewhat favorably | Very unfavorably | Neither favorably nor unfavorably (neutral) | Very unfavorably |
1186 rows × 14 columns
# select and summarize the character in selected columns 15 to 29
col_character = [15,16,17,18,19,20,21,22,23,24,25,26,27,28]
for char in col_character:
print(star_wars.columns[char])
print(star_wars[star_wars.columns[char]].value_counts(dropna=False))
Please state whether you view the following characters favorably, unfavorably, or are unfamiliar with him/her. Very favorably 610 NaN 357 Somewhat favorably 151 Neither favorably nor unfavorably (neutral) 44 Unfamiliar (N/A) 15 Somewhat unfavorably 8 Very unfavorably 1 Name: Please state whether you view the following characters favorably, unfavorably, or are unfamiliar with him/her., dtype: int64 Unnamed: 16 Very favorably 552 NaN 355 Somewhat favorably 219 Neither favorably nor unfavorably (neutral) 38 Somewhat unfavorably 13 Unfamiliar (N/A) 6 Very unfavorably 3 Name: Unnamed: 16, dtype: int64 Unnamed: 17 Very favorably 547 NaN 355 Somewhat favorably 210 Neither favorably nor unfavorably (neutral) 48 Somewhat unfavorably 12 Unfamiliar (N/A) 8 Very unfavorably 6 Name: Unnamed: 17, dtype: int64 Unnamed: 18 NaN 363 Somewhat favorably 269 Very favorably 245 Neither favorably nor unfavorably (neutral) 135 Somewhat unfavorably 83 Unfamiliar (N/A) 52 Very unfavorably 39 Name: Unnamed: 18, dtype: int64 Unnamed: 19 Very favorably 591 NaN 361 Somewhat favorably 159 Neither favorably nor unfavorably (neutral) 43 Unfamiliar (N/A) 17 Somewhat unfavorably 8 Very unfavorably 7 Name: Unnamed: 19, dtype: int64 Unnamed: 20 NaN 372 Neither favorably nor unfavorably (neutral) 213 Unfamiliar (N/A) 156 Somewhat favorably 143 Very unfavorably 124 Very favorably 110 Somewhat unfavorably 68 Name: Unnamed: 20, dtype: int64 Unnamed: 21 NaN 360 Very favorably 310 Somewhat favorably 171 Very unfavorably 149 Somewhat unfavorably 102 Neither favorably nor unfavorably (neutral) 84 Unfamiliar (N/A) 10 Name: Unnamed: 21, dtype: int64 Unnamed: 22 NaN 366 Neither favorably nor unfavorably (neutral) 236 Somewhat favorably 223 Unfamiliar (N/A) 148 Very favorably 142 Somewhat unfavorably 63 Very unfavorably 8 Name: Unnamed: 22, dtype: int64 Unnamed: 23 NaN 374 Neither favorably nor unfavorably (neutral) 248 Somewhat favorably 153 Very favorably 138 Unfamiliar (N/A) 132 Somewhat unfavorably 96 Very unfavorably 45 Name: Unnamed: 23, dtype: int64 Unnamed: 24 Very favorably 474 NaN 359 Somewhat favorably 229 Neither favorably nor unfavorably (neutral) 79 Somewhat unfavorably 23 Unfamiliar (N/A) 15 Very unfavorably 7 Name: Unnamed: 24, dtype: int64 Unnamed: 25 Very favorably 562 NaN 356 Somewhat favorably 185 Neither favorably nor unfavorably (neutral) 57 Somewhat unfavorably 10 Unfamiliar (N/A) 10 Very unfavorably 6 Name: Unnamed: 25, dtype: int64 Unnamed: 26 NaN 365 Very unfavorably 204 Neither favorably nor unfavorably (neutral) 164 Somewhat favorably 130 Very favorably 112 Unfamiliar (N/A) 109 Somewhat unfavorably 102 Name: Unnamed: 26, dtype: int64 Unnamed: 27 NaN 372 Neither favorably nor unfavorably (neutral) 207 Somewhat favorably 183 Very favorably 168 Unfamiliar (N/A) 164 Somewhat unfavorably 58 Very unfavorably 34 Name: Unnamed: 27, dtype: int64 Unnamed: 28 Very favorably 605 NaN 360 Somewhat favorably 144 Neither favorably nor unfavorably (neutral) 51 Unfamiliar (N/A) 10 Somewhat unfavorably 8 Very unfavorably 8 Name: Unnamed: 28, dtype: int64
# assigning character names to like_most, dislike_most, neutral and Unfamiliar
assign_char_name = { 'Very favorably': 'like_most',
'Somewhat favorably' : 'like_most',
'Very unfavorably ': 'dislike_most',
'Somewhat unfavorably': 'dislike_most',
'NaN': 'Neutral',
'Neither favorably nor unfavorably (neutral)': 'Neutral',
'Unfamiliar (N/A)': 'Unfamiliar'
}
# rename columns names
cols_name = {'Please state whether you view the following characters favorably, unfavorably, or are unfamiliar with him/her.':'Han Solo',
'Unnamed: 16': 'Luke Skywalker',
'Unnamed: 17': 'Princess Leia Organa',
'Unnamed: 18': 'Anakin Skywalker',
'Unnamed: 19': 'Obi Wan Kenobi',
'Unnamed: 20': 'Emperor Palpatine',
'Unnamed: 21': 'Darth Vader',
'Unnamed: 22': 'Lando Calrissian',
'Unnamed: 23': 'Boba Fett',
'Unnamed: 24': 'C-3P0',
'Unnamed: 25': 'R2 D2',
'Unnamed: 26': 'Jar Jar Binks',
'Unnamed: 27': 'Padme Amidala',
'Unnamed: 28': 'Yoda'
}
# check results after renaming cols
for char in col_character:
star_wars[star_wars.columns[char]] = star_wars[star_wars.columns[char]].map(assign_char_name)
star_wars.rename(mapper= cols_name, axis=1, inplace=True)
print(star_wars[star_wars.columns[char]].value_counts())
like_most 761 Neutral 44 Unfamiliar 15 dislike_most 8 Name: Han Solo, dtype: int64 like_most 771 Neutral 38 dislike_most 13 Unfamiliar 6 Name: Luke Skywalker, dtype: int64 like_most 757 Neutral 48 dislike_most 12 Unfamiliar 8 Name: Princess Leia Organa, dtype: int64 like_most 514 Neutral 135 dislike_most 83 Unfamiliar 52 Name: Anakin Skywalker, dtype: int64 like_most 750 Neutral 43 Unfamiliar 17 dislike_most 8 Name: Obi Wan Kenobi, dtype: int64 like_most 253 Neutral 213 Unfamiliar 156 dislike_most 68 Name: Emperor Palpatine, dtype: int64 like_most 481 dislike_most 102 Neutral 84 Unfamiliar 10 Name: Darth Vader, dtype: int64 like_most 365 Neutral 236 Unfamiliar 148 dislike_most 63 Name: Lando Calrissian, dtype: int64 like_most 291 Neutral 248 Unfamiliar 132 dislike_most 96 Name: Boba Fett, dtype: int64 like_most 703 Neutral 79 dislike_most 23 Unfamiliar 15 Name: C-3P0, dtype: int64 like_most 747 Neutral 57 Unfamiliar 10 dislike_most 10 Name: R2 D2, dtype: int64 like_most 242 Neutral 164 Unfamiliar 109 dislike_most 102 Name: Jar Jar Binks, dtype: int64 like_most 351 Neutral 207 Unfamiliar 164 dislike_most 58 Name: Padme Amidala, dtype: int64 like_most 749 Neutral 51 Unfamiliar 10 dislike_most 8 Name: Yoda, dtype: int64
# create an empty dict and loop all like_most values
_like_most_char = {}
for col in star_wars.columns[15:29]:
_like_most_char[col] = len(star_wars[star_wars[col] == 'like_most']) #take all like values
print(_like_most_char)
{'Han Solo': 761, 'Luke Skywalker': 771, 'Princess Leia Organa': 757, 'Anakin Skywalker': 514, 'Obi Wan Kenobi': 750, 'Emperor Palpatine': 253, 'Darth Vader': 481, 'Lando Calrissian': 365, 'Boba Fett': 291, 'C-3P0': 703, 'R2 D2': 747, 'Jar Jar Binks': 242, 'Padme Amidala': 351, 'Yoda': 749}
#sort out list names
like_list = ['Solo', 'Luke', 'Leia', 'Anakin',
'Obi Wan', 'Emperor', 'Darth Vader',
'Lando', 'Boba', 'C-3P0', 'R2 D2',
'Jar', 'Amidala', 'Yoda']
#combaine list with values from _like_most_char and create final_like_dict
final_like_most = dict(zip(like_list , list(_like_most_char.values())))
final_like_most
{'Solo': 761, 'Luke': 771, 'Leia': 757, 'Anakin': 514, 'Obi Wan': 750, 'Emperor': 253, 'Darth Vader': 481, 'Lando': 365, 'Boba': 291, 'C-3P0': 703, 'R2 D2': 747, 'Jar': 242, 'Amidala': 351, 'Yoda': 749}
# select keys from final like most
keys = final_like_most.keys()
# select values from final most like
values = final_like_most.values()
#generate a bar plot
plt.bar(keys,values)
plt.xticks(rotation = 90)
plt.title('Which character do you like_most?')
plt.show()
Han Solo, Luke Skywalker, Princess Leia Organa, Obi Wan, R2 D2 and Yoda are all above 700 and are most liked character in star Wars movies.
# create an empty dict and loop all dislike_most values
_dislike_most_char = {}
for col in star_wars.columns[15:29]:
_dislike_most_char[col] = len(star_wars[star_wars[col] == 'dislike_most']) #take all like values
print(_dislike_most_char)
{'Han Solo': 8, 'Luke Skywalker': 13, 'Princess Leia Organa': 12, 'Anakin Skywalker': 83, 'Obi Wan Kenobi': 8, 'Emperor Palpatine': 68, 'Darth Vader': 102, 'Lando Calrissian': 63, 'Boba Fett': 96, 'C-3P0': 23, 'R2 D2': 10, 'Jar Jar Binks': 102, 'Padme Amidala': 58, 'Yoda': 8}
#sort out list names
dislike_list = ['Solo', 'Luke', 'Leia', 'Anakin',
'Obi Wan', 'Emperor', 'Darth Vader',
'Lando', 'Boba', 'C-3P0', 'R2 D2',
'Jar', 'Amidala', 'Yoda']
#combaine list with values from _like_most_char and create final_like_dict
final_dislike_most = dict(zip(dislike_list , list(_dislike_most_char.values())))
final_dislike_most
{'Solo': 8, 'Luke': 13, 'Leia': 12, 'Anakin': 83, 'Obi Wan': 8, 'Emperor': 68, 'Darth Vader': 102, 'Lando': 63, 'Boba': 96, 'C-3P0': 23, 'R2 D2': 10, 'Jar': 102, 'Amidala': 58, 'Yoda': 8}
# select keys from final dislike most
keys = final_dislike_most.keys()
# select values from final most like
values = final_dislike_most.values()
#generate a bar plot
plt.bar(keys,values)
plt.xticks(rotation = 90)
plt.title('Which character do you dislike_most?')
plt.show()
Jar and Darth Vader are above 100 most dislike charcters in star wars movies