-In this analysis we are going to use different techniques to approach the answer required by fivethirtyeight magazine.
-Huge shoutout to [a.xitas](https://community.dataquest.io/u/a.xitas/summary) since I am using his analysis as point refference. You can find his amazing work here
#Importing nescessary libraries:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
/usr/local/lib/python3.6/dist-packages/statsmodels/tools/_testing.py:19: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead. import pandas.util.testing as tm
Now we are going to use a library called PPS, which is the same as correlation function except that PPS tries to find correlations between dataset both linearly and non-linear correaltions. Like: exponential, periodic, polynomial etc... The reason for using PPS is because I dont want to drop columns which might have impact to my analysis even tho that impact might not be linear
#Now we install into colab ppscore
!pip install ppscore
Requirement already satisfied: ppscore in /usr/local/lib/python3.6/dist-packages (0.0.2)
import ppscore as pps
from google.colab import files
uploaded = files.upload()
#Reading dataset in pandas dataframe:
star_wars = pd.read_csv("StarWars.csv", encoding ="ISO-8859-1")
#Viewing firt rows:
star_wars.head()
RespondentID | Have you seen any of the 6 films in the Star Wars franchise? | Do you consider yourself to be a fan of the Star Wars film franchise? | Which of the following Star Wars films have you seen? Please select all that apply. | Unnamed: 4 | Unnamed: 5 | Unnamed: 6 | Unnamed: 7 | Unnamed: 8 | Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film. | Unnamed: 10 | Unnamed: 11 | Unnamed: 12 | Unnamed: 13 | Unnamed: 14 | Please state whether you view the following characters favorably, unfavorably, or are unfamiliar with him/her. | Unnamed: 16 | Unnamed: 17 | Unnamed: 18 | Unnamed: 19 | Unnamed: 20 | Unnamed: 21 | Unnamed: 22 | Unnamed: 23 | Unnamed: 24 | Unnamed: 25 | Unnamed: 26 | Unnamed: 27 | Unnamed: 28 | Which character shot first? | Are you familiar with the Expanded Universe? | Do you consider yourself to be a fan of the Expanded Universe?æ | Do you consider yourself to be a fan of the Star Trek franchise? | Gender | Age | Household Income | Education | Location (Census Region) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | NaN | Response | Response | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | Han Solo | Luke Skywalker | Princess Leia Organa | Anakin Skywalker | Obi Wan Kenobi | Emperor Palpatine | Darth Vader | Lando Calrissian | Boba Fett | C-3P0 | R2 D2 | Jar Jar Binks | Padme Amidala | Yoda | Response | Response | Response | Response | Response | Response | Response | Response | Response |
1 | 3.292880e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 3 | 2 | 1 | 4 | 5 | 6 | Very favorably | Very favorably | Very favorably | Very favorably | Very favorably | Very favorably | Very favorably | Unfamiliar (N/A) | Unfamiliar (N/A) | Very favorably | Very favorably | Very favorably | Very favorably | Very favorably | I don't understand this question | Yes | No | No | Male | 18-29 | NaN | High school degree | South Atlantic |
2 | 3.292880e+09 | No | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Yes | Male | 18-29 | $0 - $24,999 | Bachelor degree | West South Central |
3 | 3.292765e+09 | Yes | No | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | NaN | NaN | NaN | 1 | 2 | 3 | 4 | 5 | 6 | Somewhat favorably | Somewhat favorably | Somewhat favorably | Somewhat favorably | Somewhat favorably | Unfamiliar (N/A) | Unfamiliar (N/A) | Unfamiliar (N/A) | Unfamiliar (N/A) | Unfamiliar (N/A) | Unfamiliar (N/A) | Unfamiliar (N/A) | Unfamiliar (N/A) | Unfamiliar (N/A) | I don't understand this question | No | NaN | No | Male | 18-29 | $0 - $24,999 | High school degree | West North Central |
4 | 3.292763e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 5 | 6 | 1 | 2 | 4 | 3 | Very favorably | Very favorably | Very favorably | Very favorably | Very favorably | Somewhat favorably | Very favorably | Somewhat favorably | Somewhat unfavorably | Very favorably | Very favorably | Very favorably | Very favorably | Very favorably | I don't understand this question | No | NaN | Yes | Male | 18-29 | $100,000 - $149,999 | Some college or Associate degree | West North Central |
#Let's analyze columns:
star_wars.columns
Index(['RespondentID', 'Have you seen any of the 6 films in the Star Wars franchise?', 'Do you consider yourself to be a fan of the Star Wars film franchise?', 'Which of the following Star Wars films have you seen? Please select all that apply.', 'Unnamed: 4', 'Unnamed: 5', 'Unnamed: 6', 'Unnamed: 7', 'Unnamed: 8', 'Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.', 'Unnamed: 10', 'Unnamed: 11', 'Unnamed: 12', 'Unnamed: 13', 'Unnamed: 14', 'Please state whether you view the following characters favorably, unfavorably, or are unfamiliar with him/her.', 'Unnamed: 16', 'Unnamed: 17', 'Unnamed: 18', 'Unnamed: 19', 'Unnamed: 20', 'Unnamed: 21', 'Unnamed: 22', 'Unnamed: 23', 'Unnamed: 24', 'Unnamed: 25', 'Unnamed: 26', 'Unnamed: 27', 'Unnamed: 28', 'Which character shot first?', 'Are you familiar with the Expanded Universe?', 'Do you consider yourself to be a fan of the Expanded Universe?æ', 'Do you consider yourself to be a fan of the Star Trek franchise?', 'Gender', 'Age', 'Household Income', 'Education', 'Location (Census Region)'], dtype='object')
We can see that some columns need some "cleaning" or better say renaming, but first let's remove all NAN values from "RespondentID" column
#Checking how many NAN values are in RespondentID:
star_wars["RespondentID"].isnull().value_counts()
False 1186 True 1 Name: RespondentID, dtype: int64
#Removing the one-missing value:
star_wars= star_wars[star_wars["RespondentID"].notna()].copy()
star_wars.head()
RespondentID | Have you seen any of the 6 films in the Star Wars franchise? | Do you consider yourself to be a fan of the Star Wars film franchise? | Which of the following Star Wars films have you seen? Please select all that apply. | Unnamed: 4 | Unnamed: 5 | Unnamed: 6 | Unnamed: 7 | Unnamed: 8 | Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film. | Unnamed: 10 | Unnamed: 11 | Unnamed: 12 | Unnamed: 13 | Unnamed: 14 | Please state whether you view the following characters favorably, unfavorably, or are unfamiliar with him/her. | Unnamed: 16 | Unnamed: 17 | Unnamed: 18 | Unnamed: 19 | Unnamed: 20 | Unnamed: 21 | Unnamed: 22 | Unnamed: 23 | Unnamed: 24 | Unnamed: 25 | Unnamed: 26 | Unnamed: 27 | Unnamed: 28 | Which character shot first? | Are you familiar with the Expanded Universe? | Do you consider yourself to be a fan of the Expanded Universe?æ | Do you consider yourself to be a fan of the Star Trek franchise? | Gender | Age | Household Income | Education | Location (Census Region) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 3.292880e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 3 | 2 | 1 | 4 | 5 | 6 | Very favorably | Very favorably | Very favorably | Very favorably | Very favorably | Very favorably | Very favorably | Unfamiliar (N/A) | Unfamiliar (N/A) | Very favorably | Very favorably | Very favorably | Very favorably | Very favorably | I don't understand this question | Yes | No | No | Male | 18-29 | NaN | High school degree | South Atlantic |
2 | 3.292880e+09 | No | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Yes | Male | 18-29 | $0 - $24,999 | Bachelor degree | West South Central |
3 | 3.292765e+09 | Yes | No | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | NaN | NaN | NaN | 1 | 2 | 3 | 4 | 5 | 6 | Somewhat favorably | Somewhat favorably | Somewhat favorably | Somewhat favorably | Somewhat favorably | Unfamiliar (N/A) | Unfamiliar (N/A) | Unfamiliar (N/A) | Unfamiliar (N/A) | Unfamiliar (N/A) | Unfamiliar (N/A) | Unfamiliar (N/A) | Unfamiliar (N/A) | Unfamiliar (N/A) | I don't understand this question | No | NaN | No | Male | 18-29 | $0 - $24,999 | High school degree | West North Central |
4 | 3.292763e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 5 | 6 | 1 | 2 | 4 | 3 | Very favorably | Very favorably | Very favorably | Very favorably | Very favorably | Somewhat favorably | Very favorably | Somewhat favorably | Somewhat unfavorably | Very favorably | Very favorably | Very favorably | Very favorably | Very favorably | I don't understand this question | No | NaN | Yes | Male | 18-29 | $100,000 - $149,999 | Some college or Associate degree | West North Central |
5 | 3.292731e+09 | Yes | Yes | Star Wars: Episode I The Phantom Menace | Star Wars: Episode II Attack of the Clones | Star Wars: Episode III Revenge of the Sith | Star Wars: Episode IV A New Hope | Star Wars: Episode V The Empire Strikes Back | Star Wars: Episode VI Return of the Jedi | 5 | 4 | 6 | 2 | 1 | 3 | Very favorably | Somewhat favorably | Somewhat favorably | Somewhat unfavorably | Very favorably | Very unfavorably | Somewhat favorably | Neither favorably nor unfavorably (neutral) | Very favorably | Somewhat favorably | Somewhat favorably | Very unfavorably | Somewhat favorably | Somewhat favorably | Greedo | Yes | No | No | Male | 18-29 | $100,000 - $149,999 | Some college or Associate degree | West North Central |
#Converting Yes & No answers from second column to True & False:
col_bool = {"Yes":True,
"No":False}
star_wars['Do you consider yourself to be a fan of the Star Wars film franchise?']=star_wars['Do you consider yourself to be a fan of the Star Wars film franchise?'].map(col_bool)
#Let's count the answers:
star_wars['Do you consider yourself to be a fan of the Star Wars film franchise?'].value_counts()
True 552 False 284 Name: Do you consider yourself to be a fan of the Star Wars film franchise?, dtype: int64
# Also for Have you seen any of the 6 films in the Star Wars franchise?:
col_bool = {"Yes":True,"No":False}
star_wars['Have you seen any of the 6 films in the Star Wars franchise?']=star_wars['Have you seen any of the 6 films in the Star Wars franchise?'].map(col_bool)
#Counting values:
star_wars['Have you seen any of the 6 films in the Star Wars franchise?'].value_counts(dropna=False)
True 936 False 250 Name: Have you seen any of the 6 films in the Star Wars franchise?, dtype: int64
#Renaming columns 3-8:
rename_field ={'Which of the following Star Wars films have you seen? Please select all that apply.':'seen_ep.1',
'Unnamed: 4':'seen_ep.2','Unnamed: 5':'seen_ep.3','Unnamed: 6':'seen_ep.4','Unnamed: 7':'seen_ep.5',
'Unnamed: 8':'seen_ep.6'}
star_wars = star_wars.rename(columns=(rename_field)).copy()
#Converting the each seen movie to true en false, for ex:
#seen_ep.1 = Star Wars: Episode I The phantom Menace
#If this is the answer then return true else false:
movie_field={'Star Wars: Episode I The Phantom Menace':True,
np.nan:False,
'Star Wars: Episode II Attack of the Clones':True,
'Star Wars: Episode III Revenge of the Sith':True,
'Star Wars: Episode IV A New Hope':True,
'Star Wars: Episode V The Empire Strikes Back':True,
'Star Wars: Episode VI Return of the Jedi':True}
for c in star_wars.columns[3:9]:
star_wars[c]=star_wars[c].map(movie_field)
for n in np.arange(6):
col = 'seen_ep.{}'.format(n+1)
print(star_wars[col].value_counts())
True 673 False 513 Name: seen_ep.1, dtype: int64 False 615 True 571 Name: seen_ep.2, dtype: int64 False 636 True 550 Name: seen_ep.3, dtype: int64 True 607 False 579 Name: seen_ep.4, dtype: int64 True 758 False 428 Name: seen_ep.5, dtype: int64 True 738 False 448 Name: seen_ep.6, dtype: int64
#Lets check the changes made so far:
star_wars.head()
RespondentID | Have you seen any of the 6 films in the Star Wars franchise? | Do you consider yourself to be a fan of the Star Wars film franchise? | seen_ep.1 | seen_ep.2 | seen_ep.3 | seen_ep.4 | seen_ep.5 | seen_ep.6 | Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film. | Unnamed: 10 | Unnamed: 11 | Unnamed: 12 | Unnamed: 13 | Unnamed: 14 | Please state whether you view the following characters favorably, unfavorably, or are unfamiliar with him/her. | Unnamed: 16 | Unnamed: 17 | Unnamed: 18 | Unnamed: 19 | Unnamed: 20 | Unnamed: 21 | Unnamed: 22 | Unnamed: 23 | Unnamed: 24 | Unnamed: 25 | Unnamed: 26 | Unnamed: 27 | Unnamed: 28 | Which character shot first? | Are you familiar with the Expanded Universe? | Do you consider yourself to be a fan of the Expanded Universe?æ | Do you consider yourself to be a fan of the Star Trek franchise? | Gender | Age | Household Income | Education | Location (Census Region) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 3.292880e+09 | True | True | True | True | True | True | True | True | 3 | 2 | 1 | 4 | 5 | 6 | Very favorably | Very favorably | Very favorably | Very favorably | Very favorably | Very favorably | Very favorably | Unfamiliar (N/A) | Unfamiliar (N/A) | Very favorably | Very favorably | Very favorably | Very favorably | Very favorably | I don't understand this question | Yes | No | No | Male | 18-29 | NaN | High school degree | South Atlantic |
2 | 3.292880e+09 | False | NaN | False | False | False | False | False | False | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Yes | Male | 18-29 | $0 - $24,999 | Bachelor degree | West South Central |
3 | 3.292765e+09 | True | False | True | True | True | False | False | False | 1 | 2 | 3 | 4 | 5 | 6 | Somewhat favorably | Somewhat favorably | Somewhat favorably | Somewhat favorably | Somewhat favorably | Unfamiliar (N/A) | Unfamiliar (N/A) | Unfamiliar (N/A) | Unfamiliar (N/A) | Unfamiliar (N/A) | Unfamiliar (N/A) | Unfamiliar (N/A) | Unfamiliar (N/A) | Unfamiliar (N/A) | I don't understand this question | No | NaN | No | Male | 18-29 | $0 - $24,999 | High school degree | West North Central |
4 | 3.292763e+09 | True | True | True | True | True | True | True | True | 5 | 6 | 1 | 2 | 4 | 3 | Very favorably | Very favorably | Very favorably | Very favorably | Very favorably | Somewhat favorably | Very favorably | Somewhat favorably | Somewhat unfavorably | Very favorably | Very favorably | Very favorably | Very favorably | Very favorably | I don't understand this question | No | NaN | Yes | Male | 18-29 | $100,000 - $149,999 | Some college or Associate degree | West North Central |
5 | 3.292731e+09 | True | True | True | True | True | True | True | True | 5 | 4 | 6 | 2 | 1 | 3 | Very favorably | Somewhat favorably | Somewhat favorably | Somewhat unfavorably | Very favorably | Very unfavorably | Somewhat favorably | Neither favorably nor unfavorably (neutral) | Very favorably | Somewhat favorably | Somewhat favorably | Very unfavorably | Somewhat favorably | Somewhat favorably | Greedo | Yes | No | No | Male | 18-29 | $100,000 - $149,999 | Some college or Associate degree | West North Central |
#Now we need to "fix" the columns 9-15:
#First lets convert values of those columns into float:
star_wars[star_wars.columns[9:15]]=star_wars[star_wars.columns[9:15]].astype(float).copy()
#Also lets convert all the previous rows into float so that True = 1 and False = 0
#This terminology will easen our work
star_wars[star_wars.columns[3:9]]=star_wars[star_wars.columns[3:9]].astype(float).copy()
#Finally lets change columns 9-15 names for better reading:
star_wars = star_wars.rename(columns={'Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.': 'ranking_Ep.1',
'Unnamed: 10': 'ranking_Ep.2',
'Unnamed: 11': 'ranking_Ep.3',
'Unnamed: 12': 'ranking_Ep.4',
'Unnamed: 13': 'ranking_Ep.5',
'Unnamed: 14': 'ranking_Ep.6'}).copy()
#Now countin renaming columns 15-28:
name_mapping = {'Please state whether you view the following characters favorably, unfavorably, or are unfamiliar with him/her.':'Han Solo',
'Unnamed: 16':'Luke Skywalker','Unnamed: 17':'Princess Leia','Unnamed: 18':'Anakin','Unnamed: 19':'Obi wan Kenobi',
'Unnamed: 20':'Palpatine','Unnamed: 21':'Darth Vader','Unnamed: 22':'Lando','Unnamed: 23':'Boba Fett','Unnamed: 24':'C-3PO',
'Unnamed: 25':'R2 D2','Unnamed: 26':'Jar Jar Binks','Unnamed: 27':'Padme','Unnamed: 28':'Yoda'}
star_wars=star_wars.rename(columns=(name_mapping)).copy()
star_wars.head()
RespondentID | Have you seen any of the 6 films in the Star Wars franchise? | Do you consider yourself to be a fan of the Star Wars film franchise? | seen_ep.1 | seen_ep.2 | seen_ep.3 | seen_ep.4 | seen_ep.5 | seen_ep.6 | ranking_Ep.1 | ranking_Ep.2 | ranking_Ep.3 | ranking_Ep.4 | ranking_Ep.5 | ranking_Ep.6 | Han Solo | Luke Skywalker | Princess Leia | Anakin | Obi wan Kenobi | Palpatine | Darth Vader | Lando | Boba Fett | C-3PO | R2 D2 | Jar Jar Binks | Padme | Yoda | Which character shot first? | Are you familiar with the Expanded Universe? | Do you consider yourself to be a fan of the Expanded Universe?æ | Do you consider yourself to be a fan of the Star Trek franchise? | Gender | Age | Household Income | Education | Location (Census Region) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 3.292880e+09 | True | True | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 3.0 | 2.0 | 1.0 | 4.0 | 5.0 | 6.0 | Very favorably | Very favorably | Very favorably | Very favorably | Very favorably | Very favorably | Very favorably | Unfamiliar (N/A) | Unfamiliar (N/A) | Very favorably | Very favorably | Very favorably | Very favorably | Very favorably | I don't understand this question | Yes | No | No | Male | 18-29 | NaN | High school degree | South Atlantic |
2 | 3.292880e+09 | False | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Yes | Male | 18-29 | $0 - $24,999 | Bachelor degree | West South Central |
3 | 3.292765e+09 | True | False | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 2.0 | 3.0 | 4.0 | 5.0 | 6.0 | Somewhat favorably | Somewhat favorably | Somewhat favorably | Somewhat favorably | Somewhat favorably | Unfamiliar (N/A) | Unfamiliar (N/A) | Unfamiliar (N/A) | Unfamiliar (N/A) | Unfamiliar (N/A) | Unfamiliar (N/A) | Unfamiliar (N/A) | Unfamiliar (N/A) | Unfamiliar (N/A) | I don't understand this question | No | NaN | No | Male | 18-29 | $0 - $24,999 | High school degree | West North Central |
4 | 3.292763e+09 | True | True | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 5.0 | 6.0 | 1.0 | 2.0 | 4.0 | 3.0 | Very favorably | Very favorably | Very favorably | Very favorably | Very favorably | Somewhat favorably | Very favorably | Somewhat favorably | Somewhat unfavorably | Very favorably | Very favorably | Very favorably | Very favorably | Very favorably | I don't understand this question | No | NaN | Yes | Male | 18-29 | $100,000 - $149,999 | Some college or Associate degree | West North Central |
5 | 3.292731e+09 | True | True | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 5.0 | 4.0 | 6.0 | 2.0 | 1.0 | 3.0 | Very favorably | Somewhat favorably | Somewhat favorably | Somewhat unfavorably | Very favorably | Very unfavorably | Somewhat favorably | Neither favorably nor unfavorably (neutral) | Very favorably | Somewhat favorably | Somewhat favorably | Very unfavorably | Somewhat favorably | Somewhat favorably | Greedo | Yes | No | No | Male | 18-29 | $100,000 - $149,999 | Some college or Associate degree | West North Central |
#Value counts of each character:
for c in range(15,29):
print(star_wars.iloc[:,c].value_counts(dropna=False))
Very favorably 610 NaN 357 Somewhat favorably 151 Neither favorably nor unfavorably (neutral) 44 Unfamiliar (N/A) 15 Somewhat unfavorably 8 Very unfavorably 1 Name: Han Solo, dtype: int64 Very favorably 552 NaN 355 Somewhat favorably 219 Neither favorably nor unfavorably (neutral) 38 Somewhat unfavorably 13 Unfamiliar (N/A) 6 Very unfavorably 3 Name: Luke Skywalker, dtype: int64 Very favorably 547 NaN 355 Somewhat favorably 210 Neither favorably nor unfavorably (neutral) 48 Somewhat unfavorably 12 Unfamiliar (N/A) 8 Very unfavorably 6 Name: Princess Leia, dtype: int64 NaN 363 Somewhat favorably 269 Very favorably 245 Neither favorably nor unfavorably (neutral) 135 Somewhat unfavorably 83 Unfamiliar (N/A) 52 Very unfavorably 39 Name: Anakin, dtype: int64 Very favorably 591 NaN 361 Somewhat favorably 159 Neither favorably nor unfavorably (neutral) 43 Unfamiliar (N/A) 17 Somewhat unfavorably 8 Very unfavorably 7 Name: Obi wan Kenobi, dtype: int64 NaN 372 Neither favorably nor unfavorably (neutral) 213 Unfamiliar (N/A) 156 Somewhat favorably 143 Very unfavorably 124 Very favorably 110 Somewhat unfavorably 68 Name: Palpatine, dtype: int64 NaN 360 Very favorably 310 Somewhat favorably 171 Very unfavorably 149 Somewhat unfavorably 102 Neither favorably nor unfavorably (neutral) 84 Unfamiliar (N/A) 10 Name: Darth Vader, dtype: int64 NaN 366 Neither favorably nor unfavorably (neutral) 236 Somewhat favorably 223 Unfamiliar (N/A) 148 Very favorably 142 Somewhat unfavorably 63 Very unfavorably 8 Name: Lando, dtype: int64 NaN 374 Neither favorably nor unfavorably (neutral) 248 Somewhat favorably 153 Very favorably 138 Unfamiliar (N/A) 132 Somewhat unfavorably 96 Very unfavorably 45 Name: Boba Fett, dtype: int64 Very favorably 474 NaN 359 Somewhat favorably 229 Neither favorably nor unfavorably (neutral) 79 Somewhat unfavorably 23 Unfamiliar (N/A) 15 Very unfavorably 7 Name: C-3PO, dtype: int64 Very favorably 562 NaN 356 Somewhat favorably 185 Neither favorably nor unfavorably (neutral) 57 Somewhat unfavorably 10 Unfamiliar (N/A) 10 Very unfavorably 6 Name: R2 D2, dtype: int64 NaN 365 Very unfavorably 204 Neither favorably nor unfavorably (neutral) 164 Somewhat favorably 130 Very favorably 112 Unfamiliar (N/A) 109 Somewhat unfavorably 102 Name: Jar Jar Binks, dtype: int64 NaN 372 Neither favorably nor unfavorably (neutral) 207 Somewhat favorably 183 Very favorably 168 Unfamiliar (N/A) 164 Somewhat unfavorably 58 Very unfavorably 34 Name: Padme, dtype: int64 Very favorably 605 NaN 360 Somewhat favorably 144 Neither favorably nor unfavorably (neutral) 51 Unfamiliar (N/A) 10 Somewhat unfavorably 8 Very unfavorably 8 Name: Yoda, dtype: int64
#Converting each string literal into a value:
conversion_mapping = {'Very favorably':5,'Somewhat favorably':4,'Neither favorably nor unfavorably (neutral)':3,
'Unfamiliar (N/A) ':2,'Very unfavorably':1}
for c in star_wars.columns[15:29]:
star_wars[c]=star_wars[c].map(conversion_mapping)
star_wars.head()
RespondentID | Have you seen any of the 6 films in the Star Wars franchise? | Do you consider yourself to be a fan of the Star Wars film franchise? | seen_ep.1 | seen_ep.2 | seen_ep.3 | seen_ep.4 | seen_ep.5 | seen_ep.6 | ranking_Ep.1 | ranking_Ep.2 | ranking_Ep.3 | ranking_Ep.4 | ranking_Ep.5 | ranking_Ep.6 | Han Solo | Luke Skywalker | Princess Leia | Anakin | Obi wan Kenobi | Palpatine | Darth Vader | Lando | Boba Fett | C-3PO | R2 D2 | Jar Jar Binks | Padme | Yoda | Which character shot first? | Are you familiar with the Expanded Universe? | Do you consider yourself to be a fan of the Expanded Universe?æ | Do you consider yourself to be a fan of the Star Trek franchise? | Gender | Age | Household Income | Education | Location (Census Region) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 3.292880e+09 | True | True | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 3.0 | 2.0 | 1.0 | 4.0 | 5.0 | 6.0 | 5.0 | 5.0 | 5.0 | 5.0 | 5.0 | 5.0 | 5.0 | NaN | NaN | 5.0 | 5.0 | 5.0 | 5.0 | 5.0 | I don't understand this question | Yes | No | No | Male | 18-29 | NaN | High school degree | South Atlantic |
2 | 3.292880e+09 | False | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Yes | Male | 18-29 | $0 - $24,999 | Bachelor degree | West South Central |
3 | 3.292765e+09 | True | False | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 2.0 | 3.0 | 4.0 | 5.0 | 6.0 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | I don't understand this question | No | NaN | No | Male | 18-29 | $0 - $24,999 | High school degree | West North Central |
4 | 3.292763e+09 | True | True | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 5.0 | 6.0 | 1.0 | 2.0 | 4.0 | 3.0 | 5.0 | 5.0 | 5.0 | 5.0 | 5.0 | 4.0 | 5.0 | 4.0 | NaN | 5.0 | 5.0 | 5.0 | 5.0 | 5.0 | I don't understand this question | No | NaN | Yes | Male | 18-29 | $100,000 - $149,999 | Some college or Associate degree | West North Central |
5 | 3.292731e+09 | True | True | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 5.0 | 4.0 | 6.0 | 2.0 | 1.0 | 3.0 | 5.0 | 4.0 | 4.0 | NaN | 5.0 | 1.0 | 4.0 | 3.0 | 5.0 | 4.0 | 4.0 | 1.0 | 4.0 | 4.0 | Greedo | Yes | No | No | Male | 18-29 | $100,000 - $149,999 | Some college or Associate degree | West North Central |
#Now lets check for correlations between columns:
pps.score(star_wars,"Are you familiar with the Expanded Universe?","Do you consider yourself to be a fan of the Star Trek franchise?")
{'baseline_score': 0.3346761186314861, 'metric': 'weighted F1', 'model': DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini', max_depth=None, max_features=None, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, presort='deprecated', random_state=None, splitter='best'), 'model_score': 0.5558575949587123, 'ppscore': 0.3324418114562114, 'task': 'classification', 'x': 'Are you familiar with the Expanded Universe?', 'y': 'Do you consider yourself to be a fan of the Star Trek franchise?'}
pps.score(star_wars,"ranking_Ep.5","Luke Skywalker")
/usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning)
{'baseline_score': 0.5502217471071754, 'metric': 'weighted F1', 'model': DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini', max_depth=None, max_features=None, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, presort='deprecated', random_state=None, splitter='best'), 'model_score': 0.5502217471071754, 'ppscore': 0.0, 'task': 'classification', 'x': 'ranking_Ep.5', 'y': 'Luke Skywalker'}
#Now lets do a matrix ppscore for every column:
pps.matrix(star_wars)
/usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning)
RespondentID | Have you seen any of the 6 films in the Star Wars franchise? | Do you consider yourself to be a fan of the Star Wars film franchise? | seen_ep.1 | seen_ep.2 | seen_ep.3 | seen_ep.4 | seen_ep.5 | seen_ep.6 | ranking_Ep.1 | ranking_Ep.2 | ranking_Ep.3 | ranking_Ep.4 | ranking_Ep.5 | ranking_Ep.6 | Han Solo | Luke Skywalker | Princess Leia | Anakin | Obi wan Kenobi | Palpatine | Darth Vader | Lando | Boba Fett | C-3PO | R2 D2 | Jar Jar Binks | Padme | Yoda | Which character shot first? | Are you familiar with the Expanded Universe? | Do you consider yourself to be a fan of the Expanded Universe?æ | Do you consider yourself to be a fan of the Star Trek franchise? | Gender | Age | Household Income | Education | Location (Census Region) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
RespondentID | 1.000000 | 0.003718 | 0.000000 | 0.000560 | 0.003776 | 0.002294 | 0.000901 | 0.000595 | 0.001531 | 0.000406 | 0.000000 | 0.000000 | 0.007527 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.001746 | 0.000206 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.006738 | 0.000000 | 0.000593 | 0.000000 | 0.000000 | 0.000000 | 0.006583 | 0.000000 | 0.000000 | 0.000000 | 0.035792 | 0.005604 | 0.000000 | 0.004386 | 0.000000 |
Have you seen any of the 6 films in the Star Wars franchise? | 0.000000 | 1.000000 | 1.000000 | 0.000012 | 0.000012 | 0.000012 | 0.000012 | 0.544901 | 0.495862 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 0.000000 | 0.000015 | 0.000015 | 0.000011 | 0.024383 | 0.000004 |
Do you consider yourself to be a fan of the Star Wars film franchise? | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
seen_ep.1 | 0.215126 | 0.589238 | 0.000005 | 1.000000 | 0.806149 | 0.755999 | 0.714964 | 0.701860 | 0.709126 | 0.000016 | 0.000005 | 0.000020 | 0.000005 | 0.000005 | 0.000005 | 0.000031 | 0.000000 | 0.005932 | 0.000011 | 0.000000 | 0.000045 | 0.000017 | 0.000000 | 0.000013 | 0.000009 | 0.000000 | 0.000031 | 0.000021 | 0.000006 | 0.000005 | 0.000005 | 0.000313 | 0.350395 | 0.000003 | 0.000003 | 0.000000 | 0.002480 | 0.000000 |
seen_ep.2 | 0.277891 | 0.480337 | 0.475349 | 0.821663 | 1.000000 | 0.909808 | 0.757094 | 0.665803 | 0.696487 | 0.148207 | 0.000000 | 0.000004 | 0.021327 | 0.027011 | 0.000004 | 0.065623 | 0.000000 | 0.022883 | 0.000008 | 0.000000 | 0.000020 | 0.000008 | 0.000000 | 0.000000 | 0.000000 | 0.008748 | 0.000014 | 0.000013 | 0.022622 | 0.321130 | 0.000000 | 0.000000 | 0.525123 | 0.373170 | 0.304233 | 0.193810 | 0.130778 | 0.163132 |
seen_ep.3 | 0.258841 | 0.433854 | 0.493468 | 0.768709 | 0.907104 | 1.000000 | 0.756273 | 0.651184 | 0.685600 | 0.226052 | 0.022765 | 0.000008 | 0.052213 | 0.005555 | 0.084395 | 0.155194 | 0.068215 | 0.028611 | 0.000008 | 0.101098 | 0.000000 | 0.246688 | 0.003391 | 0.000012 | 0.000000 | 0.008168 | 0.000029 | 0.000000 | 0.053830 | 0.358733 | 0.000004 | 0.000205 | 0.530828 | 0.407784 | 0.364370 | 0.241581 | 0.174627 | 0.210048 |
seen_ep.4 | 0.300224 | 0.536466 | 0.424191 | 0.740745 | 0.759924 | 0.766130 | 1.000000 | 0.774872 | 0.799515 | 0.316727 | 0.244494 | 0.022113 | 0.459685 | 0.144152 | 0.288230 | 0.063305 | 0.000000 | 0.036181 | 0.000007 | 0.000006 | 0.000023 | 0.000026 | 0.000000 | 0.000010 | 0.000000 | 0.000000 | 0.000019 | 0.000011 | 0.041824 | 0.035348 | 0.000000 | 0.000205 | 0.457591 | 0.333039 | 0.327396 | 0.029926 | 0.061507 | 0.150818 |
seen_ep.5 | 0.073460 | 0.676662 | 0.000012 | 0.660674 | 0.586077 | 0.580796 | 0.718196 | 1.000000 | 0.916357 | 0.000037 | 0.000012 | 0.000037 | 0.000012 | 0.081670 | 0.000012 | 0.000000 | 0.039131 | 0.078616 | 0.000000 | 0.000000 | 0.000091 | 0.000000 | 0.000056 | 0.000024 | 0.000067 | 0.000000 | 0.000080 | 0.000030 | 0.005730 | 0.000012 | 0.000012 | 0.000000 | 0.000003 | 0.000004 | 0.000004 | 0.000000 | 0.018350 | 0.000000 |
seen_ep.6 | 0.122430 | 0.653671 | 0.000010 | 0.679184 | 0.635915 | 0.634000 | 0.756875 | 0.918979 | 1.000000 | 0.000015 | 0.000010 | 0.000029 | 0.000010 | 0.016620 | 0.455987 | 0.000000 | 0.031271 | 0.062735 | 0.000013 | 0.000000 | 0.000000 | 0.000039 | 0.000000 | 0.000021 | 0.000020 | 0.000000 | 0.000000 | 0.000000 | 0.004990 | 0.000010 | 0.000010 | 0.001181 | 0.000003 | 0.000004 | 0.000004 | 0.000014 | 0.002968 | 0.000007 |
ranking_Ep.1 | 0.108774 | 0.000004 | 0.069597 | 0.000004 | 0.012084 | 0.041900 | 0.082145 | 0.100070 | 0.100283 | 1.000000 | 0.437766 | 0.384697 | 0.348861 | 0.255353 | 0.279721 | 0.031572 | 0.008187 | 0.008833 | 0.046032 | 0.013047 | 0.080896 | 0.041544 | 0.019928 | 0.074943 | 0.012134 | 0.019363 | 0.129879 | 0.004815 | 0.015300 | 0.026887 | 0.074229 | 0.060520 | 0.000003 | 0.030771 | 0.113003 | 0.000000 | 0.001289 | 0.003958 |
ranking_Ep.2 | 0.040473 | 0.000000 | 0.099570 | 0.015547 | 0.001326 | 0.011316 | 0.130966 | 0.100289 | 0.096907 | 0.542031 | 1.000000 | 0.405394 | 0.402728 | 0.169029 | 0.181091 | 0.024644 | 0.036170 | 0.024322 | 0.000006 | 0.020610 | 0.000006 | 0.000004 | 0.014904 | 0.000009 | 0.003902 | 0.007095 | 0.004710 | 0.000008 | 0.015831 | 0.000004 | 0.000004 | 0.000198 | 0.000004 | 0.000004 | 0.000004 | 0.000004 | 0.000000 | 0.000000 |
ranking_Ep.3 | 0.146001 | 0.000004 | 0.070800 | 0.074700 | 0.047821 | 0.081229 | 0.102851 | 0.104724 | 0.103940 | 0.501274 | 0.541078 | 1.000000 | 0.417020 | 0.325649 | 0.230130 | 0.048746 | 0.080641 | 0.038506 | 0.052335 | 0.029559 | 0.091040 | 0.118862 | 0.070181 | 0.075923 | 0.034397 | 0.041542 | 0.119433 | 0.054700 | 0.033898 | 0.066139 | 0.062627 | 0.052173 | 0.050498 | 0.053436 | 0.112727 | 0.101287 | 0.086736 | 0.068602 |
ranking_Ep.4 | 0.111114 | 0.000000 | 0.068120 | 0.063494 | 0.066148 | 0.042246 | 0.136082 | 0.096446 | 0.087352 | 0.321918 | 0.336959 | 0.302966 | 1.000000 | 0.331876 | 0.299002 | 0.039828 | 0.046847 | 0.026624 | 0.015878 | 0.024828 | 0.042663 | 0.000003 | 0.035794 | 0.044688 | 0.038697 | 0.032227 | 0.063261 | 0.016217 | 0.022808 | 0.081759 | 0.006104 | 0.055928 | 0.000007 | 0.020547 | 0.081395 | 0.043890 | 0.083294 | 0.044909 |
ranking_Ep.5 | 0.071190 | 0.000004 | 0.000004 | 0.000004 | 0.000004 | 0.000004 | 0.030266 | 0.110525 | 0.091999 | 0.276670 | 0.285890 | 0.291959 | 0.402958 | 1.000000 | 0.446286 | 0.040736 | 0.016542 | 0.001215 | 0.018948 | 0.012231 | 0.104541 | 0.097717 | 0.012638 | 0.096524 | 0.000004 | 0.000443 | 0.048016 | 0.112750 | 0.006304 | 0.011405 | 0.000000 | 0.000038 | 0.000000 | 0.000004 | 0.000004 | 0.023847 | 0.072501 | 0.041043 |
ranking_Ep.6 | 0.137186 | 0.000000 | 0.084318 | 0.093951 | 0.098435 | 0.091809 | 0.089726 | 0.105901 | 0.138424 | 0.292284 | 0.290504 | 0.251871 | 0.377366 | 0.470043 | 1.000000 | 0.124292 | 0.031973 | 0.087936 | 0.078691 | 0.044283 | 0.036479 | 0.019622 | 0.046181 | 0.084048 | 0.107210 | 0.059880 | 0.130345 | 0.034599 | 0.059469 | 0.112319 | 0.086169 | 0.105522 | 0.009876 | 0.097947 | 0.119461 | 0.090742 | 0.116962 | 0.064939 |
Han Solo | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.113857 | 0.214457 | 0.000007 | 0.355055 | 0.000000 | 0.000000 | 0.008373 | 0.000034 | 0.000007 | 0.000000 | 0.000009 | 0.000013 | 0.021101 | 0.000010 | 0.000010 | 0.000111 | 0.000010 | 0.000017 | 0.000017 | 0.000031 | 0.000006 | 0.000017 |
Luke Skywalker | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000011 | 0.000000 | 0.000011 | 0.000000 | 0.000000 | 0.000000 | 0.436490 | 1.000000 | 0.618626 | 0.000010 | 0.392205 | 0.000008 | 0.000012 | 0.000008 | 0.000017 | 0.167885 | 0.113603 | 0.000012 | 0.000011 | 0.300302 | 0.000006 | 0.000006 | 0.000086 | 0.000006 | 0.000000 | 0.000000 | 0.000016 | 0.000000 | 0.000012 |
Princess Leia | 0.012207 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000006 | 0.000000 | 0.000006 | 0.000000 | 0.000000 | 0.000000 | 0.467212 | 0.615463 | 1.000000 | 0.000016 | 0.346111 | 0.000008 | 0.000020 | 0.000022 | 0.000012 | 0.117248 | 0.161790 | 0.000062 | 0.000008 | 0.268719 | 0.000011 | 0.000011 | 0.000393 | 0.000011 | 0.000006 | 0.000006 | 0.000000 | 0.000011 | 0.000006 |
Anakin | 0.146619 | 0.000006 | 0.013817 | 0.001866 | 0.016629 | 0.042747 | 0.008831 | 0.003672 | 0.033807 | 0.157944 | 0.157855 | 0.127960 | 0.159037 | 0.103264 | 0.123400 | 0.247987 | 0.303809 | 0.302336 | 1.000000 | 0.288267 | 0.227583 | 0.273475 | 0.196256 | 0.177706 | 0.253795 | 0.221840 | 0.193218 | 0.304131 | 0.215116 | 0.081678 | 0.023067 | 0.171421 | 0.163530 | 0.105480 | 0.131003 | 0.152383 | 0.130887 | 0.046667 |
Obi wan Kenobi | 0.000000 | 0.000005 | 0.000005 | 0.000005 | 0.000005 | 0.000005 | 0.000005 | 0.000005 | 0.000005 | 0.000000 | 0.000000 | 0.000038 | 0.000005 | 0.000005 | 0.000005 | 0.379721 | 0.109107 | 0.109927 | 0.000030 | 1.000000 | 0.000013 | 0.000013 | 0.000000 | 0.000000 | 0.032515 | 0.059047 | 0.000026 | 0.000009 | 0.152210 | 0.000005 | 0.000005 | 0.000467 | 0.000005 | 0.000006 | 0.000006 | 0.000000 | 0.000016 | 0.000013 |
Palpatine | 0.121415 | 0.000025 | 0.041432 | 0.000025 | 0.000025 | 0.000025 | 0.000025 | 0.000025 | 0.000025 | 0.187286 | 0.091866 | 0.016173 | 0.038200 | 0.087470 | 0.022138 | 0.000005 | 0.000000 | 0.000000 | 0.162410 | 0.016423 | 1.000000 | 0.256323 | 0.323449 | 0.435209 | 0.002967 | 0.000000 | 0.209471 | 0.290748 | 0.002524 | 0.000009 | 0.056598 | 0.133198 | 0.000009 | 0.000000 | 0.066405 | 0.033801 | 0.000925 | 0.079442 |
Darth Vader | 0.064158 | 0.000000 | 0.020675 | 0.071706 | 0.000000 | 0.075388 | 0.000000 | 0.031840 | 0.034666 | 0.000000 | 0.000000 | 0.000000 | 0.008706 | 0.030936 | 0.012585 | 0.176628 | 0.199735 | 0.158335 | 0.067791 | 0.207834 | 0.192221 | 1.000000 | 0.073284 | 0.118871 | 0.182633 | 0.155130 | 0.000044 | 0.000006 | 0.185780 | 0.000003 | 0.000003 | 0.000118 | 0.000003 | 0.000006 | 0.000006 | 0.000000 | 0.000000 | 0.005253 |
Lando | 0.217045 | 0.000005 | 0.167353 | 0.010245 | 0.040057 | 0.148581 | 0.082106 | 0.000005 | 0.000005 | 0.208534 | 0.149479 | 0.142069 | 0.160587 | 0.104017 | 0.112452 | 0.188891 | 0.175492 | 0.190189 | 0.197048 | 0.191873 | 0.440070 | 0.198832 | 1.000000 | 0.511133 | 0.104187 | 0.165944 | 0.360157 | 0.327952 | 0.154776 | 0.156393 | 0.091647 | 0.035369 | 0.054659 | 0.173812 | 0.126498 | 0.134702 | 0.069367 | 0.135062 |
Boba Fett | 0.035825 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.103207 | 0.000000 | 0.010182 | 0.055157 | 0.000000 | 0.000000 | 0.000007 | 0.000000 | 0.000000 | 0.087113 | 0.004596 | 0.385719 | 0.181874 | 0.447114 | 1.000000 | 0.000000 | 0.006773 | 0.168572 | 0.182732 | 0.000286 | 0.043178 | 0.132053 | 0.174304 | 0.000005 | 0.000005 | 0.055402 | 0.000010 | 0.000000 | 0.011340 |
C-3PO | 0.014903 | 0.000003 | 0.000003 | 0.000003 | 0.000003 | 0.000003 | 0.000003 | 0.000003 | 0.000003 | 0.000004 | 0.007560 | 0.010865 | 0.000003 | 0.000003 | 0.000003 | 0.308789 | 0.341102 | 0.356074 | 0.017351 | 0.342050 | 0.000000 | 0.106996 | 0.019099 | 0.000013 | 1.000000 | 0.678593 | 0.000012 | 0.016628 | 0.328763 | 0.000007 | 0.000007 | 0.000000 | 0.000007 | 0.000026 | 0.000026 | 0.000015 | 0.001997 | 0.000004 |
R2 D2 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000023 | 0.000000 | 0.000023 | 0.000000 | 0.000000 | 0.000000 | 0.081970 | 0.122551 | 0.108654 | 0.000010 | 0.351801 | 0.000008 | 0.000022 | 0.000000 | 0.000008 | 0.624773 | 1.000000 | 0.000011 | 0.000008 | 0.318574 | 0.000007 | 0.000007 | 0.000716 | 0.000007 | 0.000004 | 0.000004 | 0.000000 | 0.000005 | 0.000005 |
Jar Jar Binks | 0.115756 | 0.000005 | 0.135262 | 0.093157 | 0.118477 | 0.122325 | 0.145294 | 0.062450 | 0.079933 | 0.214209 | 0.139858 | 0.078663 | 0.176749 | 0.091218 | 0.134041 | 0.132825 | 0.067640 | 0.072467 | 0.217039 | 0.096585 | 0.167663 | 0.109929 | 0.232399 | 0.186091 | 0.097242 | 0.081708 | 1.000000 | 0.327073 | 0.091174 | 0.149543 | 0.000004 | 0.000278 | 0.010630 | 0.123944 | 0.121777 | 0.112854 | 0.035680 | 0.031252 |
Padme | 0.205318 | 0.000009 | 0.127579 | 0.030162 | 0.124716 | 0.136081 | 0.125366 | 0.000009 | 0.000009 | 0.116920 | 0.087964 | 0.144319 | 0.126213 | 0.172084 | 0.126274 | 0.113472 | 0.168683 | 0.176255 | 0.361874 | 0.143335 | 0.378282 | 0.210631 | 0.323272 | 0.307221 | 0.245465 | 0.131687 | 0.368120 | 1.000000 | 0.155885 | 0.126876 | 0.064732 | 0.075612 | 0.135158 | 0.015823 | 0.120404 | 0.048627 | 0.127690 | 0.137640 |
Yoda | 0.000000 | 0.000005 | 0.000005 | 0.000005 | 0.000005 | 0.000005 | 0.000005 | 0.000005 | 0.000005 | 0.000016 | 0.000005 | 0.000016 | 0.000005 | 0.000005 | 0.000005 | 0.092532 | 0.126360 | 0.100532 | 0.000025 | 0.119021 | 0.000009 | 0.000000 | 0.000000 | 0.000039 | 0.000287 | 0.108560 | 0.000011 | 0.000049 | 1.000000 | 0.000005 | 0.000005 | 0.000000 | 0.000005 | 0.000000 | 0.000000 | 0.000017 | 0.000000 | 0.000000 |
Which character shot first? | 0.189310 | 0.000004 | 0.315977 | 0.200072 | 0.277469 | 0.285491 | 0.293019 | 0.127520 | 0.167011 | 0.259432 | 0.239220 | 0.199127 | 0.294164 | 0.180013 | 0.227758 | 0.178505 | 0.130605 | 0.121917 | 0.146260 | 0.168514 | 0.159399 | 0.199033 | 0.018028 | 0.186266 | 0.092990 | 0.085787 | 0.184322 | 0.095538 | 0.163068 | 1.000000 | 0.231382 | 0.000040 | 0.246991 | 0.238159 | 0.222034 | 0.129709 | 0.157096 | 0.183679 |
Are you familiar with the Expanded Universe? | 0.000000 | 0.000005 | 0.000005 | 0.000005 | 0.000005 | 0.000005 | 0.000005 | 0.000005 | 0.000005 | 0.000005 | 0.000005 | 0.000005 | 0.000005 | 0.000005 | 0.000005 | 0.000000 | 0.005955 | 0.000000 | 0.000028 | 0.010305 | 0.000008 | 0.000011 | 0.000018 | 0.032209 | 0.036106 | 0.000000 | 0.000009 | 0.114640 | 0.000000 | 0.000005 | 1.000000 | 1.000000 | 0.000005 | 0.000000 | 0.000000 | 0.000008 | 0.005986 | 0.000009 |
Do you consider yourself to be a fan of the Expanded Universe?æ | 0.191464 | 0.000037 | 0.041794 | 0.000037 | 0.000037 | 0.000037 | 0.000037 | 0.000000 | 0.018070 | 0.319011 | 0.170714 | 0.139307 | 0.226028 | 0.239330 | 0.192661 | 0.001795 | 0.094202 | 0.040680 | 0.229006 | 0.058093 | 0.329860 | 0.357188 | 0.156387 | 0.359108 | 0.160843 | 0.147379 | 0.050984 | 0.050874 | 0.114949 | 0.343039 | 0.000037 | 1.000000 | 0.059470 | 0.000078 | 0.217902 | 0.280582 | 0.236144 | 0.108468 |
Do you consider yourself to be a fan of the Star Trek franchise? | 0.074925 | 0.078513 | 0.590126 | 0.376243 | 0.461515 | 0.450092 | 0.418900 | 0.326740 | 0.363245 | 0.370747 | 0.315981 | 0.194726 | 0.269797 | 0.227483 | 0.354587 | 0.301168 | 0.296905 | 0.317594 | 0.260077 | 0.285005 | 0.053305 | 0.273062 | 0.008171 | 0.029150 | 0.184505 | 0.279002 | 0.074326 | 0.267939 | 0.314140 | 0.406675 | 0.332442 | 0.000000 | 1.000000 | 0.000002 | 0.039808 | 0.000009 | 0.000002 | 0.059147 |
Gender | 0.295154 | 0.234161 | 0.307896 | 0.339606 | 0.384379 | 0.398336 | 0.383505 | 0.282697 | 0.303873 | 0.351287 | 0.306238 | 0.239230 | 0.318992 | 0.201279 | 0.251134 | 0.067378 | 0.000000 | 0.158504 | 0.299079 | 0.119293 | 0.224003 | 0.331127 | 0.061757 | 0.116745 | 0.365916 | 0.333960 | 0.299181 | 0.213276 | 0.244978 | 0.364820 | 0.307312 | 0.000000 | 0.323475 | 1.000000 | 0.033247 | 0.253729 | 0.142279 | 0.146937 |
Age | 0.244373 | 0.088869 | 0.027108 | 0.095071 | 0.099885 | 0.090272 | 0.115237 | 0.099857 | 0.093068 | 0.248341 | 0.177901 | 0.140197 | 0.193319 | 0.096664 | 0.174134 | 0.073397 | 0.078908 | 0.062392 | 0.046474 | 0.088466 | 0.075520 | 0.103242 | 0.041723 | 0.152095 | 0.118274 | 0.023363 | 0.143585 | 0.102233 | 0.029845 | 0.172446 | 0.034997 | 0.120439 | 0.092145 | 0.013942 | 1.000000 | 0.191423 | 0.139093 | 0.126124 |
Household Income | 0.092610 | 0.000000 | 0.000020 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.055270 | 0.003571 | 0.016499 | 0.019654 | 0.016045 | 0.000000 | 0.040313 | 0.038097 | 0.042423 | 0.004204 | 0.017860 | 0.000012 | 0.008104 | 0.000007 | 0.008471 | 0.068380 | 0.059927 | 0.005024 | 0.022664 | 0.048646 | 0.000020 | 0.000020 | 0.000198 | 0.000000 | 0.000000 | 0.065052 | 1.000000 | 0.051943 | 0.042527 |
Education | 0.149888 | 0.089784 | 0.046104 | 0.023044 | 0.079007 | 0.119427 | 0.076200 | 0.109049 | 0.070884 | 0.152310 | 0.127963 | 0.163097 | 0.171672 | 0.177346 | 0.186895 | 0.034900 | 0.046141 | 0.037312 | 0.150879 | 0.039678 | 0.110926 | 0.146191 | 0.074412 | 0.037574 | 0.076892 | 0.068705 | 0.129330 | 0.128365 | 0.040495 | 0.117855 | 0.041648 | 0.153216 | 0.044534 | 0.062978 | 0.140602 | 0.173687 | 1.000000 | 0.152003 |
Location (Census Region) | 0.079902 | 0.030197 | 0.013958 | 0.047941 | 0.033201 | 0.016828 | 0.029078 | 0.022922 | 0.010261 | 0.025448 | 0.047961 | 0.058102 | 0.066035 | 0.056956 | 0.056997 | 0.029291 | 0.038348 | 0.040771 | 0.037362 | 0.009562 | 0.074176 | 0.073508 | 0.027929 | 0.051208 | 0.038702 | 0.020007 | 0.036083 | 0.029830 | 0.034925 | 0.070024 | 0.026361 | 0.030673 | 0.030788 | 0.022067 | 0.057378 | 0.043561 | 0.037934 | 1.000000 |
#Important correlation can be observed from pps.matrix, lets visualize it:
sns.heatmap(pps.matrix(star_wars),cmap="summer")
#Graph is at very end
/usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 3 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning) /usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py:667: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=4. % (min_groups, self.n_splits)), UserWarning)
<matplotlib.axes._subplots.AxesSubplot at 0x7f878c42c470>
For example whoever saw first movie is more likely to have seen all of them, because there is strong correlation between column 3 and all other columns up to 9th.* -Also columns"Do you consider yourself a fan of star wars universe" has 0 correlation with "household income", so we can drop this column. -Same for "Location" colum the highest correlation it has with other colums is 7%, so we can drop this one too. -Columns "Are you familiar with expended universe", and "Do you consider yourself to be a fan of the Expanded Universe?" have perfect correlation.
#Droping columns:
star_wars = star_wars.drop(["Household Income","Location (Census Region)"],axis=1)
#Now we calculate the mean of each rank column, where 1 is favorite and 6 least favourite movie.
#So the ones with highest values means least favourite movies:
ranking_mean = star_wars[star_wars.columns[9:15]].mean().sort_values(ascending= False)
ranking_mean.head()
ranking_Ep.3 4.341317 ranking_Ep.2 4.087321 ranking_Ep.1 3.732934 ranking_Ep.4 3.272727 ranking_Ep.6 3.047847 dtype: float64
#Now lets visualize it:
rank_graph = ranking_mean.plot.barh(edgecolor = 'none',color=[(255/255,188/255,121/255),
(162/255,200/255, 236/255),
(207/255,207/255,207/255),
(200/255,82/255,0/255),
(255/255,194/255,10/255),
(212/255,17/255,89/255)])
#Enhancing Plot Aesthetics:
for key, spine in rank_graph.spines.items():
spine.set_visible(False)
#Removing Ticks:
rank_graph.tick_params(bottom='off',top='off',left='off',right='off')
#Graph title:
rank_graph.set_title("Average Star Wars Movie Ranking")
#Setting average graph line:
rank_graph.axvline(ranking_mean.mean(),alpha=.8,linestyle='--',color='grey')
#Displaying the graph:
plt.show()
#Analysing Age column:
star_wars['Age'].value_counts().sort_values()
18-29 218 30-44 268 > 60 269 45-60 291 Name: Age, dtype: int64
-The three most liked movies are also the highest rated on IMDB -More people have seen Star Wars ranged from 30-60 Years old than 30 below, almost 80% of people who have seen the movies are 30 and above
# Now lets calculate data of each movie seen by a fan:
star_wars_seen = star_wars[star_wars.columns[3:9]].sum().copy().sort_values(ascending = False)
#Graphing a bar of the total of movies seen per Episode:
star_wars_seen_graph = star_wars_seen.plot.bar(edgecolor='none',color =[(12/255,123/255,220/255),
(93/255,58/255,155/255),
(254/255,254/255,98/255),
(211/255,95/255,183/255),
(212/255,17/255,89/255),
(64/255,176/255,166/255)])
#Setting an average line:
star_wars_seen_graph.axhline(star_wars_seen.mean(),color="grey",alpha=.8,linestyle="--")
#Removing ticks:
star_wars_seen_graph.tick_params(bottom='off',top='off',right='off',left='off')
#Removing spines:
for key, spine in star_wars_seen_graph.spines.items():
spine.set_visible(False)
#Graph title:
star_wars_seen_graph.set_title("Most Seen Episode Movie")
#Displaying graph:
plt.show()
#We are going to divide our dataset into two datasets, Male and Female:
female = star_wars[star_wars['Gender']=="Female"].copy()
male = star_wars[star_wars['Gender']=="Male"].copy()
#displaying female dataset:
female.head()
RespondentID | Have you seen any of the 6 films in the Star Wars franchise? | Do you consider yourself to be a fan of the Star Wars film franchise? | seen_ep.1 | seen_ep.2 | seen_ep.3 | seen_ep.4 | seen_ep.5 | seen_ep.6 | ranking_Ep.1 | ranking_Ep.2 | ranking_Ep.3 | ranking_Ep.4 | ranking_Ep.5 | ranking_Ep.6 | Han Solo | Luke Skywalker | Princess Leia | Anakin | Obi wan Kenobi | Palpatine | Darth Vader | Lando | Boba Fett | C-3PO | R2 D2 | Jar Jar Binks | Padme | Yoda | Which character shot first? | Are you familiar with the Expanded Universe? | Do you consider yourself to be a fan of the Expanded Universe?æ | Do you consider yourself to be a fan of the Star Trek franchise? | Gender | Age | Education | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
112 | 3.291440e+09 | True | True | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 4.0 | 5.0 | 6.0 | 2.0 | 3.0 | 5.0 | 4.0 | 4.0 | NaN | 5.0 | 3.0 | 1.0 | NaN | 3.0 | 5.0 | 5.0 | 1.0 | NaN | 4.0 | Greedo | Yes | No | Yes | Female | > 60 | Bachelor degree |
113 | 3.291439e+09 | True | False | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 2.0 | 3.0 | 4.0 | 5.0 | 6.0 | 5.0 | 4.0 | 4.0 | NaN | 5.0 | NaN | NaN | NaN | NaN | 5.0 | 5.0 | NaN | NaN | 4.0 | I don't understand this question | No | NaN | No | Female | > 60 | Graduate degree |
115 | 3.291436e+09 | True | False | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 3.0 | 4.0 | 5.0 | 6.0 | 1.0 | 2.0 | 5.0 | 5.0 | 5.0 | 4.0 | 4.0 | 3.0 | 5.0 | 3.0 | 3.0 | 5.0 | 5.0 | 3.0 | 3.0 | 4.0 | I don't understand this question | No | NaN | No | Female | 30-44 | Graduate degree |
117 | 3.291434e+09 | True | False | 1.0 | 1.0 | 1.0 | 0.0 | 1.0 | 1.0 | 3.0 | 4.0 | 5.0 | 6.0 | 2.0 | 1.0 | 5.0 | 5.0 | 5.0 | 4.0 | 4.0 | NaN | 5.0 | NaN | 3.0 | 5.0 | 5.0 | 5.0 | NaN | NaN | I don't understand this question | No | NaN | No | Female | 30-44 | Bachelor degree |
118 | 3.291432e+09 | True | True | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 3.0 | 6.0 | 4.0 | 1.0 | 2.0 | 5.0 | 5.0 | 5.0 | 5.0 | 4.0 | 4.0 | 5.0 | 5.0 | 4.0 | 4.0 | 5.0 | 5.0 | 4.0 | 5.0 | 5.0 | Han | No | NaN | Yes | Female | 30-44 | Graduate degree |
#Displaying male dataset:
male.head()
RespondentID | Have you seen any of the 6 films in the Star Wars franchise? | Do you consider yourself to be a fan of the Star Wars film franchise? | seen_ep.1 | seen_ep.2 | seen_ep.3 | seen_ep.4 | seen_ep.5 | seen_ep.6 | ranking_Ep.1 | ranking_Ep.2 | ranking_Ep.3 | ranking_Ep.4 | ranking_Ep.5 | ranking_Ep.6 | Han Solo | Luke Skywalker | Princess Leia | Anakin | Obi wan Kenobi | Palpatine | Darth Vader | Lando | Boba Fett | C-3PO | R2 D2 | Jar Jar Binks | Padme | Yoda | Which character shot first? | Are you familiar with the Expanded Universe? | Do you consider yourself to be a fan of the Expanded Universe?æ | Do you consider yourself to be a fan of the Star Trek franchise? | Gender | Age | Education | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 3.292880e+09 | True | True | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 3.0 | 2.0 | 1.0 | 4.0 | 5.0 | 6.0 | 5.0 | 5.0 | 5.0 | 5.0 | 5.0 | 5.0 | 5.0 | NaN | NaN | 5.0 | 5.0 | 5.0 | 5.0 | 5.0 | I don't understand this question | Yes | No | No | Male | 18-29 | High school degree |
2 | 3.292880e+09 | False | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Yes | Male | 18-29 | Bachelor degree |
3 | 3.292765e+09 | True | False | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 2.0 | 3.0 | 4.0 | 5.0 | 6.0 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | I don't understand this question | No | NaN | No | Male | 18-29 | High school degree |
4 | 3.292763e+09 | True | True | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 5.0 | 6.0 | 1.0 | 2.0 | 4.0 | 3.0 | 5.0 | 5.0 | 5.0 | 5.0 | 5.0 | 4.0 | 5.0 | 4.0 | NaN | 5.0 | 5.0 | 5.0 | 5.0 | 5.0 | I don't understand this question | No | NaN | Yes | Male | 18-29 | Some college or Associate degree |
5 | 3.292731e+09 | True | True | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 5.0 | 4.0 | 6.0 | 2.0 | 1.0 | 3.0 | 5.0 | 4.0 | 4.0 | NaN | 5.0 | 1.0 | 4.0 | 3.0 | 5.0 | 4.0 | 4.0 | 1.0 | 4.0 | 4.0 | Greedo | Yes | No | No | Male | 18-29 | Some college or Associate degree |
#Now we check the favourite movies and most seen movies for both female and male datasets:
#For female dataset:
ranking_mean_female = female[female.columns[9:15]].mean().sort_values(ascending=False)
#Plotting a bar graph:
ranking_mean_female_graph = ranking_mean_female.plot.barh(edgecolor='none',color= [(255/255,188/255,121/255),
(162/255,200/255, 236/255),
(207/255,207/255,207/255),
(200/255,82/255,0/255),
(255/255,194/255,10/255),
(212/255,17/255,89/255)])
#Removing spines:
for key,spine in ranking_mean_female_graph.spines.items():
spine.set_visible(False)
#Removing ticks:
ranking_mean_female_graph.tick_params(top='off',bottom='off',left='off',right='off')
#Set a title:
ranking_mean_female_graph.set_title("Average Star Wars Movies Ranking(Females)")
#Setting an average line:
ranking_mean_female_graph.axvline(ranking_mean_female.mean(),color='grey',alpha=.8,linestyle='--')
#Displaying the graph:
plt.show()
#Now same for male dataset:
rank_mean_male = star_wars[star_wars.columns[9:15]].mean().sort_values(ascending=False).copy()
#Graphing:
rank_male_graph = rank_mean_male.plot.barh(edgecolor='none',color= [(255/255,188/255,121/255),
(162/255,200/255, 236/255),
(207/255,207/255,207/255),
(200/255,82/255,0/255),
(255/255,193/255,7/255),
(216/255,27/255,96/255)])
#Removing Spines:
for key,spine in rank_male_graph.spines.items():
spine.set_visible(False)
#Removing Ticks:
rank_male_graph.tick_params(top='off',bottom='off',left='off',right='off')
#Setting a title:
rank_male_graph.set_title("Average Star Wars Movies Ranking (Males)")
#Setting an average line:
rank_male_graph.axvline(rank_mean_male.mean(),alpha=.8,linestyle='--',color='grey')
#Plotting graph:
plt.show()
#Now we continue with number of views per movie for both genders:
female_seen_total = female[female.columns[3:9]].sum().copy().sort_values(ascending=False)
#Graphing:
female_seen_graph = female_seen_total.plot.bar(edgecolor='none',color=[(12/255,123/255,220/255),
(93/255,58/255,155/255),
(254/255,254/255,98/255),
(211/255,95/255,183/255),
(212/255,17/255,89/255),
(64/255,176/255,166/255)])
#Remove spines:
for key, spine in female_seen_graph.spines.items():
spine.set_visible(False)
#Remove ticks:
female_seen_graph.tick_params(top='off',bottom='off',left='off',right='off')
#Setting a title:
female_seen_graph.set_title('Star Wars most seen Episodes (Female)')
#Setting an average line:
female_seen_graph.axhline(female_seen_total.mean(),alpha=.8,linestyle='--',color='grey')
#Plotting graph:
plt.show()
#Same for Male dataset:
male_seen_total= male[male.columns[3:9]].sum().copy().sort_values(ascending=False)
#Graphing:
male_seen_graph = male_seen_total.plot.bar(edgecolor='none',color=[(12/255,123/255,220/255),
(93/255,58/255,155/255),
(254/255,254/255,98/255),
(211/255,95/255,183/255),
(212/255,17/255,89/255),
(64/255,176/255,166/255)])
#Remove spines:
for key, spine in male_seen_graph.spines.items():
spine.set_visible(False)
#Remove ticks:
male_seen_graph.tick_params(top='off',bottom='off',left='off',right='off')
#Setting a title:
male_seen_graph.set_title('Star Wars most seen Episodes (Male)')
#Setting an average line:
male_seen_graph.axhline(male_seen_total.mean(),alpha=.8,linestyle='--',color='grey')
#Plotting graph:
plt.show()
-Differnce can be seen into ranking by both gendres, male ranked movies as their favourite: Ep5, Ep6 and Ep4. In females ranking order is the same but Ep1 takes the third place. That might mean that female appreciate more old scifi than male do.
#Checking how many levels of eduaction are:
star_wars['Education'].value_counts(dropna=False)
Some college or Associate degree 328 Bachelor degree 321 Graduate degree 275 NaN 150 High school degree 105 Less than high school degree 7 Name: Education, dtype: int64
#Setting a pivot table:
education_pivot = star_wars.pivot_table(index="Education",values=['ranking_Ep.1',
'ranking_Ep.2',
'ranking_Ep.3',
'ranking_Ep.4',
'ranking_Ep.5',
'ranking_Ep.6'],
aggfunc = np.mean,dropna=True)
education_pivot=education_pivot.reset_index().copy()
education_pivot.head()
Education | ranking_Ep.1 | ranking_Ep.2 | ranking_Ep.3 | ranking_Ep.4 | ranking_Ep.5 | ranking_Ep.6 | |
---|---|---|---|---|---|---|---|
0 | Bachelor degree | 3.828244 | 4.290076 | 4.521073 | 3.114504 | 2.309160 | 2.931298 |
1 | Graduate degree | 3.822222 | 4.225664 | 4.500000 | 3.199115 | 2.323009 | 2.920354 |
2 | High school degree | 3.802817 | 3.746479 | 4.126761 | 3.211268 | 2.873239 | 3.239437 |
3 | Less than high school degree | 5.000000 | 5.333333 | 3.666667 | 2.666667 | 1.000000 | 3.333333 |
4 | Some college or Associate degree | 3.551181 | 3.885827 | 4.102362 | 3.503937 | 2.783465 | 3.173228 |
#Plotting the pivot:
education_graph=education_pivot[['ranking_Ep.1',
'ranking_Ep.2',
'ranking_Ep.3',
'ranking_Ep.4',
'ranking_Ep.5',
'ranking_Ep.6']].plot.pie(subplots=True,
figsize=(18, 3),
legend=False,
labels=[
'B',
'G',
'HS',
'<HS',
'CAD'
],
colors=[(100/255,143/255,255/255),
(120/255,95/255,240/255),
(220/255,38/255,127/255),
(254/255,97/255,0/255),
(255/255,176/255,0/255)])
#Lets check for NAN values on character columns:
star_wars[star_wars.columns[15:29]].isna().sum()
Han Solo 380 Luke Skywalker 374 Princess Leia 375 Anakin 498 Obi wan Kenobi 386 Palpatine 596 Darth Vader 472 Lando 577 Boba Fett 602 C-3PO 397 R2 D2 376 Jar Jar Binks 576 Padme 594 Yoda 378 dtype: int64
#Removing NaN values:
star_wars_character = star_wars[star_wars.columns[15:29]].dropna(axis=0).copy()
#Now lets find Average rating by fans for each character:
average = star_wars_character.mean().sort_values(ascending=False).copy()
average
Han Solo 4.687500 Yoda 4.652344 Luke Skywalker 4.640625 Obi wan Kenobi 4.632812 Princess Leia 4.601562 R2 D2 4.578125 C-3PO 4.488281 Anakin 4.109375 Darth Vader 4.101562 Lando 3.835938 Padme 3.734375 Boba Fett 3.714844 Palpatine 3.621094 Jar Jar Binks 3.058594 dtype: float64
average_graph = average.plot.barh(edgecolor = 'none',color=[(12/255,123/255,220/255),
(93/255,58/255,155/255),
(254/255,254/255,98/255),
(211/255,95/255,183/255),
(212/255,17/255,89/255),
(64/255,176/255,166/255)])
#Remove spines:
for key, spine in average_graph.spines.items():
spine.set_visible(False)
#Remove ticks:
average_graph.tick_params(top='off',bottom='off',left='off',right='off')
#Setting a title:
average_graph.set_title('Characters average Voted')
#Setting an average line:
average_graph.axvline(average.mean(),alpha=.8,linestyle='--',color='grey')
#Plotting graph:
plt.show()
-*New fans like way more 'Empire Strikes back' than older ones*