The goal of this project - analyzing data on the Star Wars movies.
While waiting for Star Wars: The Force Awakens to come out, the team at FiveThirtyEight became interested in answering some questions about Star Wars fans. In particular, they wondered: does the rest of America realize that “The Empire Strikes Back” is clearly the best of the bunch?
The team needed to collect data addressing this question. To do this, they surveyed Star Wars fans using the online tool SurveyMonkey. They received 835 total responses, which you download from their GitHub repository.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
star_wars = pd.read_csv("star_wars.csv", encoding="ISO-8859-1")
# View head dataset
star_wars
RespondentID | Have you seen any of the 6 films in the Star Wars franchise? | Do you consider yourself to be a fan of the Star Wars film franchise? | Which of the following Star Wars films have you seen? Please select all that apply. | Unnamed: 4 | Unnamed: 5 | Unnamed: 6 | Unnamed: 7 | Unnamed: 8 | Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film. | ... | Unnamed: 28 | Which character shot first? | Are you familiar with the Expanded Universe? | Do you consider yourself to be a fan of the Expanded Universe?Âæ | Do you consider yourself to be a fan of the Star Trek franchise? | Gender | Age | Household Income | Education | Location (Census Region) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | NaN | Response | Response | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | Star Wars: Episode I The Phantom Menace | ... | Yoda | Response | Response | Response | Response | Response | Response | Response | Response | Response |
1 | 3.292880e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 3 | ... | Very favorably | I don't understand this question | Yes | No | No | Male | 18-29 | NaN | High school degree | South Atlantic |
2 | 3.292880e+09 | No | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | Yes | Male | 18-29 | $0 - $24,999 | Bachelor degree | West South Central |
3 | 3.292765e+09 | Yes | No | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | NaN | NaN | NaN | 1 | ... | Unfamiliar (N/A) | I don't understand this question | No | NaN | No | Male | 18-29 | $0 - $24,999 | High school degree | West North Central |
4 | 3.292763e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 5 | ... | Very favorably | I don't understand this question | No | NaN | Yes | Male | 18-29 | $100,000 - $149,999 | Some college or Associate degree | West North Central |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1182 | 3.288389e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 5 | ... | Very favorably | Han | No | NaN | Yes | Female | 18-29 | $0 - $24,999 | Some college or Associate degree | East North Central |
1183 | 3.288379e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 4 | ... | Very favorably | I don't understand this question | No | NaN | Yes | Female | 30-44 | $50,000 - $99,999 | Bachelor degree | Mountain |
1184 | 3.288375e+09 | No | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | No | Female | 30-44 | $50,000 - $99,999 | Bachelor degree | Middle Atlantic |
1185 | 3.288373e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 4 | ... | Very favorably | Han | No | NaN | Yes | Female | 45-60 | $100,000 - $149,999 | Some college or Associate degree | East North Central |
1186 | 3.288373e+09 | Yes | No | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | NaN | NaN | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 6 | ... | Very unfavorably | I don't understand this question | No | NaN | No | Female | > 60 | $50,000 - $99,999 | Graduate degree | Pacific |
1187 rows × 38 columns
#Viev list of columns
star_wars.columns
Index(['RespondentID', 'Have you seen any of the 6 films in the Star Wars franchise?', 'Do you consider yourself to be a fan of the Star Wars film franchise?', 'Which of the following Star Wars films have you seen? Please select all that apply.', 'Unnamed: 4', 'Unnamed: 5', 'Unnamed: 6', 'Unnamed: 7', 'Unnamed: 8', 'Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.', 'Unnamed: 10', 'Unnamed: 11', 'Unnamed: 12', 'Unnamed: 13', 'Unnamed: 14', 'Please state whether you view the following characters favorably, unfavorably, or are unfamiliar with him/her.', 'Unnamed: 16', 'Unnamed: 17', 'Unnamed: 18', 'Unnamed: 19', 'Unnamed: 20', 'Unnamed: 21', 'Unnamed: 22', 'Unnamed: 23', 'Unnamed: 24', 'Unnamed: 25', 'Unnamed: 26', 'Unnamed: 27', 'Unnamed: 28', 'Which character shot first?', 'Are you familiar with the Expanded Universe?', 'Do you consider yourself to be a fan of the Expanded Universe?Âæ', 'Do you consider yourself to be a fan of the Star Trek franchise?', 'Gender', 'Age', 'Household Income', 'Education', 'Location (Census Region)'], dtype='object')
# Drop NaN rows in column 'RespondentID'
star_wars.dropna(axis = 0, subset = ['RespondentID'], inplace=True)
# Checks number of rows of dataset
star_wars
RespondentID | Have you seen any of the 6 films in the Star Wars franchise? | Do you consider yourself to be a fan of the Star Wars film franchise? | Which of the following Star Wars films have you seen? Please select all that apply. | Unnamed: 4 | Unnamed: 5 | Unnamed: 6 | Unnamed: 7 | Unnamed: 8 | Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film. | ... | Unnamed: 28 | Which character shot first? | Are you familiar with the Expanded Universe? | Do you consider yourself to be a fan of the Expanded Universe?Âæ | Do you consider yourself to be a fan of the Star Trek franchise? | Gender | Age | Household Income | Education | Location (Census Region) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 3.292880e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 3 | ... | Very favorably | I don't understand this question | Yes | No | No | Male | 18-29 | NaN | High school degree | South Atlantic |
2 | 3.292880e+09 | No | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | Yes | Male | 18-29 | $0 - $24,999 | Bachelor degree | West South Central |
3 | 3.292765e+09 | Yes | No | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | NaN | NaN | NaN | 1 | ... | Unfamiliar (N/A) | I don't understand this question | No | NaN | No | Male | 18-29 | $0 - $24,999 | High school degree | West North Central |
4 | 3.292763e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 5 | ... | Very favorably | I don't understand this question | No | NaN | Yes | Male | 18-29 | $100,000 - $149,999 | Some college or Associate degree | West North Central |
5 | 3.292731e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 5 | ... | Somewhat favorably | Greedo | Yes | No | No | Male | 18-29 | $100,000 - $149,999 | Some college or Associate degree | West North Central |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1182 | 3.288389e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 5 | ... | Very favorably | Han | No | NaN | Yes | Female | 18-29 | $0 - $24,999 | Some college or Associate degree | East North Central |
1183 | 3.288379e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 4 | ... | Very favorably | I don't understand this question | No | NaN | Yes | Female | 30-44 | $50,000 - $99,999 | Bachelor degree | Mountain |
1184 | 3.288375e+09 | No | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | No | Female | 30-44 | $50,000 - $99,999 | Bachelor degree | Middle Atlantic |
1185 | 3.288373e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 4 | ... | Very favorably | Han | No | NaN | Yes | Female | 45-60 | $100,000 - $149,999 | Some college or Associate degree | East North Central |
1186 | 3.288373e+09 | Yes | No | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | NaN | NaN | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 6 | ... | Very unfavorably | I don't understand this question | No | NaN | No | Female | > 60 | $50,000 - $99,999 | Graduate degree | Pacific |
1186 rows × 38 columns
us we see, one row with NaN value in column 'RespondentID' deleted.
Convert the "Have you seen any of the 6 films in the Star Wars franchise?" and "Do you consider yourself to be a fan of the Star Wars film franchise?" columns to the Boolean type.
Before conversion check range of values in these columns
star_wars['Have you seen any of the 6 films in the Star Wars franchise?'].unique()
array(['Yes', 'No'], dtype=object)
star_wars['Do you consider yourself to be a fan of the Star Wars film franchise?'].unique()
array(['Yes', nan, 'No'], dtype=object)
# Define conversion dictionaries
yes_no = {
"Yes": True,
"No": False
}
# Define list of columns
columns = ['Have you seen any of the 6 films in the Star Wars franchise?',
'Do you consider yourself to be a fan of the Star Wars film franchise?']
for col in columns:
star_wars[col] = star_wars[col].map(yes_no)
# Check conversion
star_wars['Have you seen any of the 6 films in the Star Wars franchise?'].unique()
array([ True, False])
star_wars['Do you consider yourself to be a fan of the Star Wars film franchise?'].unique()
array([True, nan, False], dtype=object)
Let's see interesting us columns with checked box
star_wars.iloc[:, 3:9].head(10)
Which of the following Star Wars films have you seen? Please select all that apply. | Unnamed: 4 | Unnamed: 5 | Unnamed: 6 | Unnamed: 7 | Unnamed: 8 | |
---|---|---|---|---|---|---|
1 | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi |
2 | NaN | NaN | NaN | NaN | NaN | NaN |
3 | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | NaN | NaN | NaN |
4 | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi |
5 | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi |
6 | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi |
7 | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi |
8 | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi |
9 | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi |
10 | NaN | Star Wars: Episode II Attack of the Clones | NaN | NaN | NaN | NaN |
Firstly rename hardly understanding columns to easy for understanding name of each movies.
# Defiane rename dictionary
movies_dic = {'Which of the following Star Wars films have you seen? Please select all that apply.':
'Star Wars: Episode I The Phantom Menace',
'Unnamed: 4' : 'Star Wars: Episode II Attack of the Clones',
'Unnamed: 5' : 'Star Wars: Episode III Revenge of the Sith',
'Unnamed: 6' : 'Star Wars: Episode IV A New Hope',
'Unnamed: 7' : 'Star Wars: Episode V The Empire Strikes Back',
'Unnamed: 8' : 'Star Wars: Episode VI Return of the Jedi'
}
# Rename columns
star_wars = star_wars.rename(columns = movies_dic)
Secondary, convert values in renamed columns to True and False
# Define list of columns for apply map
movies_col = ['Star Wars: Episode I The Phantom Menace',
'Star Wars: Episode II Attack of the Clones',
'Star Wars: Episode III Revenge of the Sith',
'Star Wars: Episode IV A New Hope',
'Star Wars: Episode V The Empire Strikes Back',
'Star Wars: Episode VI Return of the Jedi'
]
# Define dictonary for mapping
movies_map = {'Star Wars: Episode I The Phantom Menace' : True,
'Star Wars: Episode II Attack of the Clones' : True,
'Star Wars: Episode III Revenge of the Sith' : True,
'Star Wars: Episode IV A New Hope' : True,
'Star Wars: Episode V The Empire Strikes Back' : True,
'Star Wars: Episode VI Return of the Jedi' : True,
np.NaN : False
}
for col in movies_col:
star_wars[col] = star_wars[col].map(movies_map)
And checks as renaming columns and converted values in it:
star_wars.iloc[:, 8:14].head(10)
Star Wars: Episode VI Return of the Jedi | Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film. | Unnamed: 10 | Unnamed: 11 | Unnamed: 12 | Unnamed: 13 | |
---|---|---|---|---|---|---|
1 | True | 3 | 2 | 1 | 4 | 5 |
2 | False | NaN | NaN | NaN | NaN | NaN |
3 | False | 1 | 2 | 3 | 4 | 5 |
4 | True | 5 | 6 | 1 | 2 | 4 |
5 | True | 5 | 4 | 6 | 2 | 1 |
6 | True | 1 | 4 | 3 | 6 | 5 |
7 | True | 6 | 5 | 4 | 3 | 1 |
8 | True | 4 | 5 | 6 | 3 | 2 |
9 | True | 5 | 4 | 6 | 2 | 1 |
10 | False | 1 | 2 | 3 | 4 | 5 |
Lets see values in the columns 9 - 14.
star_wars.iloc[:, 9:15].head(6)
Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film. | Unnamed: 10 | Unnamed: 11 | Unnamed: 12 | Unnamed: 13 | Unnamed: 14 | |
---|---|---|---|---|---|---|
1 | 3 | 2 | 1 | 4 | 5 | 6 |
2 | NaN | NaN | NaN | NaN | NaN | NaN |
3 | 1 | 2 | 3 | 4 | 5 | 6 |
4 | 5 | 6 | 1 | 2 | 4 | 3 |
5 | 5 | 4 | 6 | 2 | 1 | 3 |
6 | 1 | 4 | 3 | 6 | 5 | 2 |
# Create dictionary for renaiming columns 9 - 14
rank_dic = {'Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.':
'sw_1_rank',
'Unnamed: 10' : 'sw_2_rank',
'Unnamed: 11' : 'sw_3_rank',
'Unnamed: 12' : 'sw_4_rank',
'Unnamed: 13' : 'sw_5_rank',
'Unnamed: 14' : 'sw_6_rank'
}
# Rename columns
star_wars = star_wars.rename(columns = rank_dic)
# Convert values to float
star_wars[star_wars.columns[9:15]] = star_wars[star_wars.columns[9:15]].astype(float)
# Check converterd values and renaiming columns
star_wars.describe()
# star_wars.iloc[:, 9:15].head(6)
RespondentID | sw_1_rank | sw_2_rank | sw_3_rank | sw_4_rank | sw_5_rank | sw_6_rank | |
---|---|---|---|---|---|---|---|
count | 1.186000e+03 | 835.000000 | 836.000000 | 835.000000 | 836.000000 | 836.000000 | 836.000000 |
mean | 3.290128e+09 | 3.732934 | 4.087321 | 4.341317 | 3.272727 | 2.513158 | 3.047847 |
std | 1.055639e+06 | 1.656122 | 1.365365 | 1.400464 | 1.825901 | 1.578620 | 1.666897 |
min | 3.288373e+09 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
25% | 3.289451e+09 | 3.000000 | 3.000000 | 3.000000 | 2.000000 | 1.000000 | 2.000000 |
50% | 3.290147e+09 | 4.000000 | 4.000000 | 5.000000 | 3.000000 | 2.000000 | 3.000000 |
75% | 3.290814e+09 | 5.000000 | 5.000000 | 6.000000 | 5.000000 | 3.000000 | 4.000000 |
max | 3.292880e+09 | 6.000000 | 6.000000 | 6.000000 | 6.000000 | 6.000000 | 6.000000 |
As we from table bellow the lowest mean value has sw_5_rank or Star Wars: Episode V The Empire Strikes Back movie. Create plot.
mean_plt = star_wars[star_wars.columns[9:15]].mean().reset_index()
col_dic = {'index' : 'movies', 0 : 'mean rank'}
mean_plt = mean_plt.rename(columns = col_dic)
ax = mean_plt.plot.bar(figsize = (12, 10), x = 'movies', y = 'mean rank', color = '#0040FF')
ax.set_xlabel('Movies', fontsize = 14, weight = 'bold')
ax.set_xticklabels(ax.xaxis.get_majorticklabels(), fontsize = 12, weight = 'bold')
ax.set_ylabel('Mean rank', fontsize = 14, weight = 'bold')
Text(0, 0.5, 'Mean rank')
Us we see, the most popular from StarWars movies is Episode V The Empire Strikes Back (1980) movie and next popular by survey rating - Episode IV A New Hope (1977) and Episode VI Return of the Jedi (1983). In my opinion - no computer effects will draw out a boring, flat and stupid screenplay, which became in modern movies of Star War. For example - first movies of Terminator was overwhelming success - although the the cost of first episode was filmed for some cents. In my opinion Matrix -III finished the era of Great America cinema - after this movies wee see awful bullshit stuffed special effects and trying parasitaze of great movies of the past as sequels or prequels may be Joker.
sum_plt = star_wars[star_wars.columns[3:9]].sum().reset_index()
col_dic = {'index' : 'movies', 0 : 'views'}
sum_plt = sum_plt.rename(columns = col_dic)
ax = sum_plt.plot.barh(figsize = (15, 10), x = 'movies', y = 'views', color = '#0040FF')
ax.set_xlabel('Movies', fontsize = 16, weight = 'bold')
ax.set_yticklabels(ax.yaxis.get_majorticklabels(), fontsize = 14, weight = 'bold')
ax.set_ylabel('Views', fontsize = 16, weight = 'bold');
Us we see, the most popular from StarWars movies is Episode V The Empire Strikes Back (1980) movie and next popular by survey rating - Episode IV A New Hope (1983) and Episode VI Return of the Jedi (1977) and Episode I The Phantom Menace. About possible reasons these result I wrote above.
Extract data for men and women to two dataframe
# Extract data for each sex
men_rank = star_wars[star_wars['Gender'] == 'Male'][star_wars.columns[9:15]].mean().reset_index()
women_rank = star_wars[star_wars['Gender'] == 'Female'][star_wars.columns[9:15]].mean().reset_index()
# Merge to one dataset use inner index
sex_rank = men_rank.merge(women_rank, how='inner', on='index')
# Create dict for renaming columns
sex_dic = {'index' : 'movies', '0_x' : 'men', '0_y': 'women'}
# Rename columns for men and women
sex_rank = sex_rank.rename(columns = sex_dic)
# Check and view dataframe
sex_rank
movies | men | women | |
---|---|---|---|
0 | sw_1_rank | 4.037825 | 3.429293 |
1 | sw_2_rank | 4.224586 | 3.954660 |
2 | sw_3_rank | 4.274882 | 4.418136 |
3 | sw_4_rank | 2.997636 | 3.544081 |
4 | sw_5_rank | 2.458629 | 2.569270 |
5 | sw_6_rank | 3.002364 | 3.078086 |
And create plot
# Set index for movies values and create plot for mean fank
sex_rank.set_index('movies').plot(kind="bar", color = ('#0040FF','#FF6347'), figsize = (12,8))
plt.title("Mean rank movies for sex", fontsize = 16, weight = 'bold')
plt.xlabel("Movies", fontsize = 13, weight = 'bold')
plt.ylabel("Mean Rank", fontsize = 13, weight = 'bold');
As wee see the most popular for each sex from StarWars movies is Episode V The Empire Strikes Back (1980) movie and next popular by survey rating - Episode IV A New Hope (1977) and Episode VI Return of the Jedi (1983). Us wee see men more like StarWars than women. Us we see women get lower mean rank on percentage relative men:
#Extract data for each sex
men_views = star_wars[star_wars['Gender'] == 'Male'][star_wars.columns[3:9]].sum().reset_index()
women_views = star_wars[star_wars['Gender'] == 'Female'][star_wars.columns[3:9]].sum().reset_index()
# Merge to one dataset use inner index
sex_views = men_views.merge(women_views, how='inner', on='index')
# Create dict for renaming columns
sex_dic = {'index' : 'movies', '0_x' : 'men', '0_y': 'women'}
# Rename columns for men and women
sex_views = sex_views.rename(columns = sex_dic)
# Check and view dataframe
sex_views
movies | men | women | |
---|---|---|---|
0 | Star Wars: Episode I The Phantom Menace | 361 | 298 |
1 | Star Wars: Episode II Attack of the Clones | 323 | 237 |
2 | Star Wars: Episode III Revenge of the Sith | 317 | 222 |
3 | Star Wars: Episode IV A New Hope | 342 | 255 |
4 | Star Wars: Episode V The Empire Strikes Back | 392 | 353 |
5 | Star Wars: Episode VI Return of the Jedi | 387 | 338 |
And create plot
# Set index for movies values and create plot for mean fank
sex_views.set_index('movies').plot(kind="barh", color = ('#0040FF','#FF6347'), figsize = (13,9))
plt.title("Views number for movies for men and women", fontsize = 16, weight = 'bold')
plt.xlabel("Movies", fontsize = 13, weight = 'bold')
plt.ylabel("Views number", fontsize = 13, weight = 'bold')
ax.set_yticklabels(ax.yaxis.get_majorticklabels(), fontsize = 15, weight = 'bold');
Generally us we see men more interested StarWars than women. Difference of views of women in percentage relative men variate in range -9.95% (min) to -26.63% (max).
# Create new column for defining sum of views of movies
star_wars['views_tot'] = star_wars[star_wars.columns[3:9]].sum(axis=1)
# Create new column for views all
# star_wars['views_all'] = star_wars['seen_summary'].apply(
# lambda x: False if x<6 else True)
# # Counting values of 'seen_summary' for the respondents who have seen
# # at least some films
# star_wars[star_wars[
# 'Have you seen any of the 6 films in the Star Wars franchise?']==True
# ]['seen_summary'].value_counts()
star_wars['views_tot'].value_counts()
6 471 0 351 3 99 2 85 4 72 1 56 5 52 Name: views_tot, dtype: int64
And create plot
star_wars['views_tot'].value_counts().plot(kind="barh", color = '#0040FF', figsize = (12,8))
plt.xlabel("Viewers", fontsize = 13, weight = 'bold')
plt.ylabel("Viewer summary movies number", fontsize = 13, weight = 'bold')
ax.set_yticklabels(ax.yaxis.get_majorticklabels(), fontsize = 15, weight = 'bold');
Us wee see from this bar
star_wars['Age'].value_counts()
45-60 291 > 60 269 30-44 268 18-29 218 Name: Age, dtype: int64
and create plot
star_wars['Age'].value_counts().sort_values(ascending=True).plot(kind="barh", color = '#0040FF', figsize = (12,8))
plt.xlabel("Respondent", fontsize = 13, weight = 'bold')
plt.ylabel("Respondent age category", fontsize = 13, weight = 'bold')
ax.set_yticklabels(ax.yaxis.get_majorticklabels(), fontsize = 15, weight = 'bold');
As we see the most of respondents have age from 30 to over 60, but and more young respondents with little total decreasing share seen StarWars movies which began in 1977 or more 40 years ago.
1186 respondents are not representative sample for defining general statistic tends. Conclusions for each parts of project writing above. In general StarWars remain deep trace in mass culture and has popularity later over 40 years first night.
How write wiki:
In my opinion firstly important the original idea of screenplay, revealing the characters and action in different situation the heroes of movie. And and seemingly different films as first three StarWars movies, Terminator -1, Duel, Alien and other movies disappeared Great American Cinema classics it is this that united.
Created on Mar 2, 2021
@author: Vadim Maklakov, used some ideas from public Internet resources.
© 3-clause BSD License
Software environment: Debian 10.7, Python 3.8.7
required next preinstalled python modules: numpy, pandas , matplotlib.