import pandas as pd
import numpy as np
# Reading into a Dataframe the Star Wars Dataset:
star_wars = pd.read_csv('star_wars.csv', encoding='ISO-8859-1')
star_wars.head()
RespondentID | Have you seen any of the 6 films in the Star Wars franchise? | Do you consider yourself to be a fan of the Star Wars film franchise? | Which of the following Star Wars films have you seen? Please select all that apply. | Unnamed: 4 | Unnamed: 5 | Unnamed: 6 | Unnamed: 7 | Unnamed: 8 | Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film. | ... | Unnamed: 28 | Which character shot first? | Are you familiar with the Expanded Universe? | Do you consider yourself to be a fan of the Expanded Universe?Âæ | Do you consider yourself to be a fan of the Star Trek franchise? | Gender | Age | Household Income | Education | Location (Census Region) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | NaN | Response | Response | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | Star Wars: Episode I The Phantom Menace | ... | Yoda | Response | Response | Response | Response | Response | Response | Response | Response | Response |
1 | 3.292880e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 3 | ... | Very favorably | I don't understand this question | Yes | No | No | Male | 18-29 | NaN | High school degree | South Atlantic |
2 | 3.292880e+09 | No | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | Yes | Male | 18-29 | $0 - $24,999 | Bachelor degree | West South Central |
3 | 3.292765e+09 | Yes | No | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | NaN | NaN | NaN | 1 | ... | Unfamiliar (N/A) | I don't understand this question | No | NaN | No | Male | 18-29 | $0 - $24,999 | High school degree | West North Central |
4 | 3.292763e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 5 | ... | Very favorably | I don't understand this question | No | NaN | Yes | Male | 18-29 | $100,000 - $149,999 | Some college or Associate degree | West North Central |
5 rows × 38 columns
# Extracting and checking the Dataset columns:
star_wars.columns
Index(['RespondentID', 'Have you seen any of the 6 films in the Star Wars franchise?', 'Do you consider yourself to be a fan of the Star Wars film franchise?', 'Which of the following Star Wars films have you seen? Please select all that apply.', 'Unnamed: 4', 'Unnamed: 5', 'Unnamed: 6', 'Unnamed: 7', 'Unnamed: 8', 'Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.', 'Unnamed: 10', 'Unnamed: 11', 'Unnamed: 12', 'Unnamed: 13', 'Unnamed: 14', 'Please state whether you view the following characters favorably, unfavorably, or are unfamiliar with him/her.', 'Unnamed: 16', 'Unnamed: 17', 'Unnamed: 18', 'Unnamed: 19', 'Unnamed: 20', 'Unnamed: 21', 'Unnamed: 22', 'Unnamed: 23', 'Unnamed: 24', 'Unnamed: 25', 'Unnamed: 26', 'Unnamed: 27', 'Unnamed: 28', 'Which character shot first?', 'Are you familiar with the Expanded Universe?', 'Do you consider yourself to be a fan of the Expanded Universe?Âæ', 'Do you consider yourself to be a fan of the Star Trek franchise?', 'Gender', 'Age', 'Household Income', 'Education', 'Location (Census Region)'], dtype='object')
# Removing all the NaN rows in the RespondentID col of our Dataset, using
# the notna() method:
star_wars = star_wars[star_wars['RespondentID'].notna()].copy()
# Validating the operation:
star_wars['RespondentID'].isnull().value_counts()
False 1186 Name: RespondentID, dtype: int64
# Converting the Yes's and No's in the 'Have you seen any of the 6 films
# in the Star Wars franchise?' col, to True's and False's. Transforming that
# way the entire col from a string type to a Boolean type:
star_wars[
'Have you seen any of the 6 films in the Star Wars franchise?'
] = star_wars[
'Have you seen any of the 6 films in the Star Wars franchise?'
].map({'Yes': True, 'No': False}).copy()
# Checking the results:
star_wars['Have you seen any of the 6 films in the Star Wars franchise?'].value_counts(
dropna=False)
True 936 False 250 Name: Have you seen any of the 6 films in the Star Wars franchise?, dtype: int64
# Converting the Yes's and No's in the 'Do you consider yourself to be a
# fan of the Star Wars film franchise?' col, to True's and False's.
# Transforming that way the entire col from a string type to a Boolean type:
star_wars[
'Do you consider yourself to be a fan of the Star Wars film franchise?'
] = star_wars[
'Do you consider yourself to be a fan of the Star Wars film franchise?'
].map({'Yes': True, 'No': False})
# Veifying the changes:
star_wars[
'Do you consider yourself to be a fan of the Star Wars film franchise?'
].value_counts(dropna=False)
File "<ipython-input-36-0474692d3fa5>", line 9 ].str.replace(['Yes', 'No'], [np.True, np.False]) ^ SyntaxError: invalid syntax
# Renaming all the cols that correspond to the movies seen by the Star Wars
# fans from Unnamed to seen, adding to it a numeric value that corresponds
# to each of the 6 Episodes from the saga. E.g., seen_Ep.1 == seen Episode I,
# seen_Ep.4 == seen Episode IV, seen_Ep.6 == seen Episode VI, etc:
star_wars = star_wars.rename(columns={
'Which of the following Star Wars films have you seen? Please select all that apply.': 'seen_Ep.1',
'Unnamed: 4': 'seen_Ep.2',
'Unnamed: 5': 'seen_Ep.3',
'Unnamed: 6': 'seen_Ep.4',
'Unnamed: 7': 'seen_Ep.5',
'Unnamed: 8': 'seen_Ep.6'
}).copy()
# Changing the content of the cols. Passing from a str to a boolean type.
# First we will creat a list of all the names of the movies that populate
# our seen cols. Than, resorting to a for loop, we will change the format
# of that very same columns. Attributing the ones with the name of the movie
# a True boolean and assuming that the ones that have null values correspond
# to False answers related to whether or not the fans have seen that
# particular movie:
movies = [
'Star Wars: Episode I The Phantom Menace',
'Star Wars: Episode II Attack of the Clones',
'Star Wars: Episode III Revenge of the Sith',
'Star Wars: Episode IV A New Hope',
'Star Wars: Episode V The Empire Strikes Back',
'Star Wars: Episode VI Return of the Jedi'
]
for n in np.arange(6):
col = 'seen_Ep.{}'.format(n+1)
star_wars[col] = star_wars[col].map({
movies[n]: True, np.NaN: False})
# Checking and validating the previous changes in the Dataset:
for n in np.arange(6):
col = 'seen_Ep.{}'.format(n+1)
print(star_wars[col].value_counts().sum() == star_wars.shape[0])
print(star_wars[col].value_counts())
True True 673 False 513 Name: seen_Ep.1, dtype: int64 True False 615 True 571 Name: seen_Ep.2, dtype: int64 True False 636 True 550 Name: seen_Ep.3, dtype: int64 True True 607 False 579 Name: seen_Ep.4, dtype: int64 True True 758 False 428 Name: seen_Ep.5, dtype: int64 True True 738 False 448 Name: seen_Ep.6, dtype: int64
# Converting the Star Wars movies rating on preference cols from str-float:
star_wars[star_wars.columns[3:15]] = star_wars[
star_wars.columns[3:15]].astype(float).copy()
# Checking the outcome of the operation:
print(star_wars[star_wars.columns[9:15]].dtypes)
Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film. float64 Unnamed: 10 float64 Unnamed: 11 float64 Unnamed: 12 float64 Unnamed: 13 float64 Unnamed: 14 float64 dtype: object
# Renaming those very same cols to a more descriptive name,
# resorting to the rename() method from pandas:
star_wars = star_wars.rename(columns={
'Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.': 'ranking_Ep.1',
'Unnamed: 10': 'ranking_Ep.2',
'Unnamed: 11': 'ranking_Ep.3',
'Unnamed: 12': 'ranking_Ep.4',
'Unnamed: 13': 'ranking_Ep.5',
'Unnamed: 14': 'ranking_Ep.6'}).copy()
# Verifying the changes:
print(star_wars.columns[9:15])
Index(['ranking_Ep.1', 'ranking_Ep.2', 'ranking_Ep.3', 'ranking_Ep.4', 'ranking_Ep.5', 'ranking_Ep.6'], dtype='object')
import matplotlib.pyplot as plt
%matplotlib inline
# Calculating the mean of all the 5 ranking cols.
# The values range from 1-6. Being 1 the most favorite and
# 6 the least favorite movie. So the ones that have a higher
# ranking value are the least favorite movies from the
# Star Wars saga, and vice-versa:
ranking_means = star_wars[
star_wars.columns[9:15]].mean().sort_values(
ascending=False)
ranking_means.head()
# Plotting those means in a bar graph:
rank_graph = ranking_means.plot.barh(
edgecolor='none',
color= [(255/255,188/255,121/255),
(162/255,200/255, 236/255),
(207/255,207/255,207/255),
(200/255,82/255,0/255),
(255/255,194/255,10/255),
(212/255,17/255,89/255)])
# ENHANCING PLOT AESTHETICS:
# Removing all the 4 spines with a for loop from our graph figure:
for key, spine in rank_graph.spines.items():
spine.set_visible(False)
# Removing the ticks:
rank_graph.tick_params(
bottom='off', top='off', left='off', right='off')
# Setting a graph title:
rank_graph.set_title('Average Star Wars Movies Ranking')
# Setting an average graph line:
rank_graph.axvline(ranking_means.mean(),
alpha=.8, linestyle='--', color='grey')
# Displaying the graph:
plt.show()
# Analysing the Age column from our Dataset:
star_wars['Age'].value_counts().sort_values()
18-29 218 30-44 268 > 60 269 45-60 291 Name: Age, dtype: int64
# Indexing, calculating and sorting the data that corresponds to the six
# different movies seen by the fans:
star_wars_seen_total = star_wars[star_wars.columns[3:9]].sum().copy().sort_values(
ascending=False)
# Plotting a bar graph of the total of movies seen per Episode of the Star
# Wars saga:
star_wars_seen_total_graph = star_wars_seen_total.plot.bar(
edgecolor='none',
color=[(12/255,123/255,220/255),
(93/255,58/255,155/255),
(254/255,254/255,98/255),
(211/255,95/255,183/255),
(212/255,17/255,89/255),
(64/255,176/255,166/255)])
# Setting an average graph line:
star_wars_seen_total_graph.axhline(
star_wars_seen_total.mean(),
color='grey', alpha=.8, linestyle=':')
# ENHANCING PLOT AESTHETICS:
# Removing the ticks:
star_wars_seen_total_graph.tick_params(
bottom='off', top='off', left='off', right='off')
# Removing all the 4 spines with a for loop from our graph figure:
for key, spine in star_wars_seen_total_graph.spines.items():
spine.set_visible(False)
# Setting a graph title:
star_wars_seen_total_graph.set_title('Most seen Episode movie')
# Rotating the xtick labels:
plt.xticks(rotation='horizontal')
# Displaying the graph:
plt.show()
# Dividing our Dataset into two new ones. One for Female respondents and
# another one for Male ones:
female = star_wars[star_wars['Gender'] == 'Female'].copy()
male = star_wars[star_wars['Gender'] == 'Male'].copy()
# Displaying the first 5 rows of the female Dataset:
female.head()
RespondentID | Have you seen any of the 6 films in the Star Wars franchise? | Do you consider yourself to be a fan of the Star Wars film franchise? | seen_Ep.1 | seen_Ep.2 | seen_Ep.3 | seen_Ep.4 | seen_Ep.5 | seen_Ep.6 | ranking_Ep.1 | ... | Unnamed: 28 | Which character shot first? | Are you familiar with the Expanded Universe? | Do you consider yourself to be a fan of the Expanded Universe?Âæ | Do you consider yourself to be a fan of the Star Trek franchise? | Gender | Age | Household Income | Education | Location (Census Region) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
112 | 3.291440e+09 | True | True | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | ... | Somewhat favorably | Greedo | Yes | No | Yes | Female | > 60 | $50,000 - $99,999 | Bachelor degree | South Atlantic |
113 | 3.291439e+09 | True | False | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | ... | Somewhat favorably | I don't understand this question | No | NaN | No | Female | > 60 | $25,000 - $49,999 | Graduate degree | Pacific |
115 | 3.291436e+09 | True | False | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 3.0 | ... | Somewhat favorably | I don't understand this question | No | NaN | No | Female | 30-44 | $50,000 - $99,999 | Graduate degree | South Atlantic |
117 | 3.291434e+09 | True | False | 1.0 | 1.0 | 1.0 | 0.0 | 1.0 | 1.0 | 3.0 | ... | Somewhat unfavorably | I don't understand this question | No | NaN | No | Female | 30-44 | $25,000 - $49,999 | Bachelor degree | West South Central |
118 | 3.291432e+09 | True | True | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 3.0 | ... | Very favorably | Han | No | NaN | Yes | Female | 30-44 | $25,000 - $49,999 | Graduate degree | South Atlantic |
5 rows × 38 columns
# Redoing the same two previous analyses, and plotting two graph bars for
# the most ranked Star Wars movies and the most seen ones from all the saga.
# Ladies first.
# Indexing, calculating and sorting the data that corresponds to the
# Female gender:
ranking_means_female = female[
female.columns[9:15]].mean().sort_values(
ascending=False)
ranking_means_female.head()
# Plotting those means in a bar graph:
rank_graph_female = ranking_means_female.plot.barh(
edgecolor='none',
color= [(255/255,188/255,121/255),
(162/255,200/255, 236/255),
(207/255,207/255,207/255),
(200/255,82/255,0/255),
(255/255,194/255,10/255),
(212/255,17/255,89/255)])
# ENHANCING PLOT AESTHETICS:
# Removing all the 4 spines with a for loop from our graph figure:
for key, spine in rank_graph_female.spines.items():
spine.set_visible(False)
# Removing the ticks:
rank_graph_female.tick_params(
bottom='off', top='off', left='off', right='off')
# Setting a graph title:
rank_graph_female.set_title('Average Star Wars Movies Ranking (Female_Respondents)')
# Setting an average graph line:
rank_graph_female.axvline(ranking_means_female.mean(),
alpha=.8, linestyle='--', color='grey')
# Displaying the graph:
plt.show()
# Doing the same process as previous only this time for the male Gender.
# Indexing, calculating and sorting the data that corresponds to the
# Male Gender:
ranking_means_male = male[
male.columns[9:15]].mean().sort_values(
ascending=False)
ranking_means_male.head()
# Plotting those means in a bar graph:
rank_graph_male = ranking_means_male.plot.barh(
edgecolor='none',
color= [(255/255,188/255,121/255),
(162/255,200/255, 236/255),
(207/255,207/255,207/255),
(200/255,82/255,0/255),
(255/255,193/255,7/255),
(216/255,27/255,96/255)])
# ENHANCING PLOT AESTHETICS:
# Removing all the 4 spines with a for loop from our graph figure:
for key, spine in rank_graph_male.spines.items():
spine.set_visible(False)
# Removing the ticks:
rank_graph_male.tick_params(
bottom='off', top='off', left='off', right='off')
# Setting a graph title:
rank_graph_male.set_title('Average Star Wars Movies Ranking (Male_Respondents)')
# Setting an average graph line:
rank_graph_male.axvline(ranking_means_male.mean(),
alpha=.8, linestyle='--', color='grey')
# Displaying the graph:
plt.show()
# Now lets deal with the number of views per movie, for all the six Episodes
# of the Star Wars saga that the Female Gender as seen.
# As usual, ladies first.
# Indexing, calculating and sorting the data that corresponds to the
# Female Gender:
female_seen_total = female[female.columns[3:9]].sum().copy().sort_values(
ascending=False)
# Plotting a bar graph of the Female_seen_total Dataset:
female_seen_total_graph = female_seen_total.plot.bar(
edgecolor='none',
color=[(12/255,123/255,220/255),
(93/255,58/255,155/255),
(254/255,254/255,98/255),
(211/255,95/255,183/255),
(212/255,17/255,89/255),
(64/255,176/255,166/255)])
# ENHANCING PLOT AESTHETICS:
# Setting an average graph line:
female_seen_total_graph.axhline(
female_seen_total.mean(),
color='grey', alpha=.8, linestyle=':')
# Turning off all the ticks:
female_seen_total_graph.tick_params(
bottom='off', top='off', left='off', right='off')
# Cleaning out all the spines from our graph figure:
for key, spine in female_seen_total_graph.spines.items():
spine.set_visible(False)
# Setting a graph title:
female_seen_total_graph.set_title('Star Wars most seen Episodes (Female_Respondents)')
# Rotating the xtick labels:
plt.xticks(rotation='horizontal')
# Displaying the graph:
plt.show()
# Doing the same process as previous only this time for the male Gender.
# Indexing, calculating and sorting the data that corresponds to the
# Male Gender:
male_seen_total = male[male.columns[3:9]].sum().copy().sort_values(
ascending=False)
# Plotting a bar graph of the male_seen_total Dataset:
male_seen_total_graph = male_seen_total.plot.bar(
edgecolor='none',
color=[(12/255,123/255,220/255),
(93/255,58/255,155/255),
(254/255,254/255,98/255),
(211/255,95/255,183/255),
(212/255,17/255,89/255),
(64/255,176/255,166/255)])
# ENHANCING PLOT AESTHETICS:
# Setting an average graph line:
male_seen_total_graph.axhline(
male_seen_total.mean(),
color='grey', alpha=.8, linestyle=':')
# Turning off all the ticks:
male_seen_total_graph.tick_params(
bottom='off', top='off', left='off', right='off')
# Cleaning out all the spines from our graph figure:
for key, spine in male_seen_total_graph.spines.items():
spine.set_visible(False)
# Setting a graph title:
male_seen_total_graph.set_title('Star Wars most seen Episodes (Male_Respondents)')
# Rotating the xtick labels:
plt.xticks(rotation='horizontal')
# Displaying the graph:
plt.show()
# Figuring out how many Education levels there are in the Dataset:
star_wars['Education'].value_counts(dropna=False)
Some college or Associate degree 328 Bachelor degree 321 Graduate degree 275 NaN 150 High school degree 105 Less than high school degree 7 Name: Education, dtype: int64
# Creating our pivot table based on those 5 levels of Education:
education_pivot = star_wars.pivot_table(
index='Education',
values=['ranking_Ep.1',
'ranking_Ep.2',
'ranking_Ep.3',
'ranking_Ep.4',
'ranking_Ep.5',
'ranking_Ep.6'],
aggfunc='mean',
dropna=True
)
# Reseting the Dataframe (DF) index and turning Education as a label:
education_pivot = education_pivot.reset_index().copy()
# Displaying the first five rows of our newly created education_pivot DF:
education_pivot.head()
Education | ranking_Ep.1 | ranking_Ep.2 | ranking_Ep.3 | ranking_Ep.4 | ranking_Ep.5 | ranking_Ep.6 | |
---|---|---|---|---|---|---|---|
0 | Bachelor degree | 3.828244 | 4.290076 | 4.521073 | 3.114504 | 2.309160 | 2.931298 |
1 | Graduate degree | 3.822222 | 4.225664 | 4.500000 | 3.199115 | 2.323009 | 2.920354 |
2 | High school degree | 3.802817 | 3.746479 | 4.126761 | 3.211268 | 2.873239 | 3.239437 |
3 | Less than high school degree | 5.000000 | 5.333333 | 3.666667 | 2.666667 | 1.000000 | 3.333333 |
4 | Some college or Associate degree | 3.551181 | 3.885827 | 4.102362 | 3.503937 | 2.783465 | 3.173228 |
# Plotting 6 pie charts, one for each movie,of the rankings segmented by the
# Star Wars fans Education levels.These levels are divided into 5 categories:
# Bachelor degree (B); Graduate degree (G); High school degree (HS);
# Less than high school degree (<HS); Some College or Associate degree (CAD):
# Plotting 6 pie graphs of the education_pivot Dataset:
edu = education_pivot[[
'ranking_Ep.1',
'ranking_Ep.2',
'ranking_Ep.3',
'ranking_Ep.4',
'ranking_Ep.5',
'ranking_Ep.6']].plot.pie(subplots=True,
figsize=(18, 3),
legend=False,
labels=[
'B',
'G',
'HS',
'<HS',
'CAD'
],
colors=[(100/255,143/255,255/255),
(120/255,95/255,240/255),
(220/255,38/255,127/255),
(254/255,97/255,0/255),
(255/255,176/255,0/255)])
# ENHANCING PLOT AESTHETICS:
# Removing the edgecolor black from all the pie charts:
plt.rcParams['patch.edgecolor'] = 'white'
# Displaying the graph:
plt.show()
# Returning the necessary cols for this specific analysis:
print(star_wars.columns[15:29])
Index(['Please state whether you view the following characters favorably, unfavorably, or are unfamiliar with him/her.', 'Unnamed: 16', 'Unnamed: 17', 'Unnamed: 18', 'Unnamed: 19', 'Unnamed: 20', 'Unnamed: 21', 'Unnamed: 22', 'Unnamed: 23', 'Unnamed: 24', 'Unnamed: 25', 'Unnamed: 26', 'Unnamed: 27', 'Unnamed: 28'], dtype='object')
# Renaming all our cols that rate, according to the fans preferences, the
# characters from the first six Episodes of the Star Wars saga.
# Each column is being renamed after the correspondent character:
star_wars = star_wars.rename(columns={
'Please state whether you view the following characters favorably, unfavorably, or are unfamiliar with him/her.': 'Han Solo',
'Unnamed: 16': 'Luke Skywalker',
'Unnamed: 17': 'Princess Leia Organa',
'Unnamed: 18': 'Anakin Skywalker',
'Unnamed: 19': 'Obi Wan Kenobi',
'Unnamed: 20': 'Emperor Palpatine',
'Unnamed: 21': 'Darth Vader',
'Unnamed: 22': 'Lando Calrissian',
'Unnamed: 23': 'Boba Fett',
'Unnamed: 24': 'C-3P0',
'Unnamed: 25': 'R2 D2',
'Unnamed: 26': 'Jar Jar Binks',
'Unnamed: 27': 'Padme Amidala',
'Unnamed: 28': 'Yoda'}).copy()
# Validating the previous changes:
star_wars.columns[15:29]
Index(['Han Solo', 'Luke Skywalker', 'Princess Leia Organa', 'Anakin Skywalker', 'Obi Wan Kenobi', 'Emperor Palpatine', 'Darth Vader', 'Lando Calrissian', 'Boba Fett', 'C-3P0', 'R2 D2', 'Jar Jar Binks', 'Padme Amidala', 'Yoda'], dtype='object')
# Checking, resorting to the value_counts() method, how many rating types
# there are to rank our characters:
star_wars['Luke Skywalker'].value_counts(dropna=False).sort_values()
Very unfavorably 3 Unfamiliar (N/A) 6 Somewhat unfavorably 13 Neither favorably nor unfavorably (neutral) 38 Somewhat favorably 219 NaN 355 Very favorably 552 Name: Luke Skywalker, dtype: int64
star_wars['Darth Vader'].value_counts(dropna=False).sort_values()
Unfamiliar (N/A) 10 Neither favorably nor unfavorably (neutral) 84 Somewhat unfavorably 102 Very unfavorably 149 Somewhat favorably 171 Very favorably 310 NaN 360 Name: Darth Vader, dtype: int64
star_wars['Yoda'].value_counts(dropna=False).sort_values()
Very unfavorably 8 Somewhat unfavorably 8 Unfamiliar (N/A) 10 Neither favorably nor unfavorably (neutral) 51 Somewhat favorably 144 NaN 360 Very favorably 605 Name: Yoda, dtype: int64
# In the few samples we previous analyzed we already check that the number
# of Null values is very material in all of them.
# Let's check in all of our characters cols with the isna() method:
star_wars[star_wars.columns[15:29]].isna().sum()
Han Solo 357 Luke Skywalker 355 Princess Leia Organa 355 Anakin Skywalker 363 Obi Wan Kenobi 361 Emperor Palpatine 372 Darth Vader 360 Lando Calrissian 366 Boba Fett 374 C-3P0 359 R2 D2 356 Jar Jar Binks 365 Padme Amidala 372 Yoda 360 dtype: int64
# Droping all the nulls from our characters rating using the dropna method,
# and assigning the values to a new variable:
star_wars_characters_rating = star_wars[star_wars.columns[
15:29]].dropna(axis=0).copy()
# Verifying the previous method:
star_wars_characters_rating.isna().sum()
Han Solo 0 Luke Skywalker 0 Princess Leia Organa 0 Anakin Skywalker 0 Obi Wan Kenobi 0 Emperor Palpatine 0 Darth Vader 0 Lando Calrissian 0 Boba Fett 0 C-3P0 0 R2 D2 0 Jar Jar Binks 0 Padme Amidala 0 Yoda 0 dtype: int64
# Now thats combine the several types of answers that the respondents gave
# into 4 major groups: Favorably; Unfavorably; Neutral; Unknown. we do this
# in order to facilitate and structure our analysis in biggers groups:
star_wars_characters_rating = star_wars_characters_rating.replace([
'Somewhat unfavorably',
'Very unfavorably',
'Somewhat favorably',
'Very favorably',
'Unfamiliar (N/A)',
'Neither favorably nor unfavorably (neutral)'],
['Unfavorably',
'Unfavorably',
'Favorably',
'Favorably',
'Unknown',
'Neutral']).copy()
# Shortening some cols names, thus improving their readability:
star_wars_characters_rating = star_wars_characters_rating.rename(columns={
'Princess Leia Organa': 'Princess Leia',
'Anakin Skywalker': 'Anakin'}).copy()
# Displaying the first five rows
star_wars_characters_rating.head()
Han Solo | Luke Skywalker | Princess Leia | Anakin | Obi Wan Kenobi | Emperor Palpatine | Darth Vader | Lando Calrissian | Boba Fett | C-3P0 | R2 D2 | Jar Jar Binks | Padme Amidala | Yoda | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Favorably | Favorably | Favorably | Favorably | Favorably | Favorably | Favorably | Unknown | Unknown | Favorably | Favorably | Favorably | Favorably | Favorably |
3 | Favorably | Favorably | Favorably | Favorably | Favorably | Unknown | Unknown | Unknown | Unknown | Unknown | Unknown | Unknown | Unknown | Unknown |
4 | Favorably | Favorably | Favorably | Favorably | Favorably | Favorably | Favorably | Favorably | Unfavorably | Favorably | Favorably | Favorably | Favorably | Favorably |
5 | Favorably | Favorably | Favorably | Unfavorably | Favorably | Unfavorably | Favorably | Neutral | Favorably | Favorably | Favorably | Unfavorably | Favorably | Favorably |
6 | Favorably | Favorably | Favorably | Favorably | Favorably | Neutral | Favorably | Neutral | Favorably | Favorably | Favorably | Favorably | Neutral | Favorably |
# Creating all our 4 Bars, based on the 4 major ratings each respondent
# gave to the main Star Wars characters, stacking them up accordingly to
# that very same rating group, and distributing them along each of the main
# characters:
fig, ax = plt.subplots(figsize=(22,10))
characters = star_wars_characters_rating.columns.values
r = np.arange(len(characters))
bar_w = .85
# Calculating the percentages of the 4 major character rating groups:
Favorably = ((star_wars_characters_rating ==
'Favorably').sum()/star_wars_characters_rating.shape[0]) * 100
Unfavorably = ((star_wars_characters_rating ==
'Unfavorably').sum()/star_wars_characters_rating.shape[0]) * 100
Neutral = (star_wars_characters_rating ==
'Neutral').sum()/star_wars_characters_rating.shape[0] * 100
Unknown = (star_wars_characters_rating ==
'Unknown').sum()/star_wars_characters_rating.shape[0] * 100
# Plotting a stacked bar graph for each of the previous variables:
plt.bar(r, Favorably[r], color=(26/255,255/255,26/255), edgecolor='white', width = bar_w)
plt.bar(r, Neutral[r], bottom=Favorably[r], color=(254/255,254/255,98/255), edgecolor='white', width = bar_w)
plt.bar(r, Unknown[r], bottom=(Favorably[r]+ Neutral[r]), color=(195/255,186/255,164/255), edgecolor='white', width = bar_w)
plt.bar(r, Unfavorably[r], bottom=(Favorably[r]+ Neutral[r] + Unknown[r]), color=(254/255,58/255,0/255), edgecolor='white', width=bar_w)
plt.xticks(r+.35, characters)
plt.yticks(np.arange(20, 120, 20))
# ENHANCING PLOT AESTHETICS:
# Removing the ticks from the graph:
ax.tick_params(bottom='off',
top='off',
left='off',
right='off')
# Removing the spines from our graph:
for key, spine in ax.spines.items():
spine.set_visible(False)
# Setting up a legend box for our bar graph:
plt.legend(
loc='upper right',
labels=('Favorably', 'Neutral', 'Unknown', 'Unfavorably'),
ncol=1, fancybox=True, framealpha=.6)
# Displaying the graph:
plt.show()
1. Star Wars fans are well aware of the importance of the Episode 5 - The Empire Strikes Back movie. Not only was this the most seen
movie as it was the one they ranked at number one from all the 6
Episodes of the saga;
2. There's a collage of the most ranked movies of the saga by the
fans, and their respective rating at IMDb. The opposite is also seen on the other side of the force;
3. Bad boys will go to hell, and Good boys will top the preference
choices of the fans;
4. Better to be a bad boy than a bad actor or a bad role. And Jar
Jar Binks is a very good example of those two;
5. Marketing expenditures with a pinch of growing anxiety creates
the ideal recipe to bring massive flows of fans to the movie
theaters. Well done 20th Century Fox, better late than never;
6. The previous does not mean that you can apply the same recipe
over and over to boost movie theaters ticket sales. A schmuck will
remain a schmuck no matter how well portrayed he is. Better luck
next time 20th Century Fox. Maybe channelling more money into the
movies per se, and improving their quality, instead of masking them with tons of marketing woudn't be such a bad idea;
7. The IQ levels of the Star Wars fans doesn't play a significant
role in their opinions and how they perceive the quality of the
movies;
8. What unease most of the trully Star Wars fans, now that all of
the Star Wars sagas and future productions have been sold to the
Mickey Mouse, Mulan or Maleficent: Mistress of Evil orchestrator,
is: Will we ever again in our lifetime have another opportunity
to contemplate a movie such as The Empire Strikes Back?