Jeopardy! is an American television game show created by Merv Griffin. The show features a quiz competition in which contestants are presented with general knowledge clues in the form of answers, and must phrase their responses in the form of questions. For example, if a contestant were to select "Presidents for $200", the resulting clue could be "This 'Father of Our Country' didn't really chop down a cherry tree", to which the correct response is "Who is/was George Washington?" (Contestants are free to phrase the response in the form of any question; the traditional phrasing of "who is/are" for people or "what is/are" for things or words is almost always used.)
The column titled 'Question' in the data file relates to the general knowledge Clues in the form of answers posed by the TV Show Host. So technically, they are not questions. Therefore, I will change the title heading from 'Question' to 'Clue'.
The column title 'Answer' is the response of the contestant and their response must be in the form of a question. Even though that's the case and the answers shown in the available file are not in question form, I WILL NOT change that column title to 'Question', but will leave it as 'Answer'.
The dataset is named jeopardy.csv, and contains 20000 rows from the beginning of a full dataset of Jeopardy questions, which you can download here.
The goal of this project is to work with a dataset of Jeopardy questions to figure out if there are significant patterns in the questions that could help me win if I had an interest in being a future contestant.
The provided file is an unordered list of questions where each question has:
'category' : the question category, e.g. "HISTORY"
'value' : \$ value of the question as string, e.g. "$200"
Note: This is "None" for Final Jeopardy! and Tiebreaker questions
'question' : text of question
'answer' : text of answer
'round' : one of "Jeopardy!","Double Jeopardy!","Final Jeopardy!" or "Tiebreaker"
Note: Tiebreaker questions do happen but they're very rare (like once every 20 years)
'show_number' : string of show number, e.g '4680'
'air_date' : the show air date in format YYYY-MM-DD
Two of the column titles can be confusing. The column titled 'Question'
# import key libraries to execute this project.
import pandas as pd
import numpy as np
import random
import string
import matplotlib.pyplot as plt
import seaborn as sns
from numpy.random import seed, randint
from IPython.display import HTML
from IPython.display import display, Markdown
import warnings
warnings.filterwarnings('ignore')
# read *.csv file provided.
jeopardy_df = pd.read_csv('jeopardy.csv')
# change column title from 'Question' to 'Clue'
jeopardy_df = jeopardy_df.rename(columns={' Question':'Clue'})
display(Markdown('<h3><span style="color:blue">First Five Rows</h3>'))
display(jeopardy_df.head())
print('\n')
display(Markdown('<h3><span style="color:blue">Last Five Rows</h3>'))
display(jeopardy_df.tail())
display(Markdown('<h3><span style="color:blue">Column Titles</h3>'))
list(jeopardy_df.columns)
Show Number | Air Date | Round | Category | Value | Clue | Answer | |
---|---|---|---|---|---|---|---|
0 | 4680 | 2004-12-31 | Jeopardy! | HISTORY | $200 | For the last 8 years of his life, Galileo was ... | Copernicus |
1 | 4680 | 2004-12-31 | Jeopardy! | ESPN's TOP 10 ALL-TIME ATHLETES | $200 | No. 2: 1912 Olympian; football star at Carlisl... | Jim Thorpe |
2 | 4680 | 2004-12-31 | Jeopardy! | EVERYBODY TALKS ABOUT IT... | $200 | The city of Yuma in this state has a record av... | Arizona |
3 | 4680 | 2004-12-31 | Jeopardy! | THE COMPANY LINE | $200 | In 1963, live on "The Art Linkletter Show", th... | McDonald's |
4 | 4680 | 2004-12-31 | Jeopardy! | EPITAPHS & TRIBUTES | $200 | Signer of the Dec. of Indep., framer of the Co... | John Adams |
Show Number | Air Date | Round | Category | Value | Clue | Answer | |
---|---|---|---|---|---|---|---|
19994 | 3582 | 2000-03-14 | Jeopardy! | U.S. GEOGRAPHY | $200 | Of 8, 12 or 18, the number of U.S. states that... | 18 |
19995 | 3582 | 2000-03-14 | Jeopardy! | POP MUSIC PAIRINGS | $200 | ...& the New Power Generation | Prince |
19996 | 3582 | 2000-03-14 | Jeopardy! | HISTORIC PEOPLE | $200 | In 1589 he was appointed professor of mathemat... | Galileo |
19997 | 3582 | 2000-03-14 | Jeopardy! | 1998 QUOTATIONS | $200 | Before the grand jury she said, "I'm really so... | Monica Lewinsky |
19998 | 3582 | 2000-03-14 | Jeopardy! | LLAMA-RAMA | $200 | Llamas are the heftiest South American members... | Camels |
['Show Number', ' Air Date', ' Round', ' Category', ' Value', 'Clue', ' Answer']
NOTE: I see that if I need to reference certain columns, some titles have a space before the first letter.
# set column width wider to see more of question displayed.
pd.set_option('display.max_colwidth', 85)
# define function to remove punctuations from strings.
def remove_punctuations(text):
for punctuation in string.punctuation:
text = text.replace(punctuation, '')
return text
# remove all punctuation from Clue column, apply to the DF series
jeopardy_df['Clean_Clue'] = jeopardy_df['Clue'].apply(remove_punctuations)
display(Markdown('<h2>Punctuation Removed</h2>'))
display(jeopardy_df.iloc[:5,[5,7]])
# make all characters lower case in 'Clean Clue' column.
display(Markdown('<h2>Make All Lower Case</h2>'))
jeopardy_df['Clean_Clue'] = jeopardy_df['Clean_Clue'].str.lower()
display(jeopardy_df.iloc[:5,[5,7]])
# reset column width to default.
pd.reset_option('display.max_colwidth')
Clue | Clean_Clue | |
---|---|---|
0 | For the last 8 years of his life, Galileo was under house arrest for espousing th... | For the last 8 years of his life Galileo was under house arrest for espousing thi... |
1 | No. 2: 1912 Olympian; football star at Carlisle Indian School; 6 MLB seasons with... | No 2 1912 Olympian football star at Carlisle Indian School 6 MLB seasons with the... |
2 | The city of Yuma in this state has a record average of 4,055 hours of sunshine ea... | The city of Yuma in this state has a record average of 4055 hours of sunshine eac... |
3 | In 1963, live on "The Art Linkletter Show", this company served its billionth burger | In 1963 live on The Art Linkletter Show this company served its billionth burger |
4 | Signer of the Dec. of Indep., framer of the Constitution of Mass., second Preside... | Signer of the Dec of Indep framer of the Constitution of Mass second President of... |
Clue | Clean_Clue | |
---|---|---|
0 | For the last 8 years of his life, Galileo was under house arrest for espousing th... | for the last 8 years of his life galileo was under house arrest for espousing thi... |
1 | No. 2: 1912 Olympian; football star at Carlisle Indian School; 6 MLB seasons with... | no 2 1912 olympian football star at carlisle indian school 6 mlb seasons with the... |
2 | The city of Yuma in this state has a record average of 4,055 hours of sunshine ea... | the city of yuma in this state has a record average of 4055 hours of sunshine eac... |
3 | In 1963, live on "The Art Linkletter Show", this company served its billionth burger | in 1963 live on the art linkletter show this company served its billionth burger |
4 | Signer of the Dec. of Indep., framer of the Constitution of Mass., second Preside... | signer of the dec of indep framer of the constitution of mass second president of... |
Let's look at the full range of number of words per clue statement among the 20,000 clues. I will use a box and whisker plot to see if there are any outliers that perhaps should be omitted from analysis.
# create a column that contains the number of words in each clue.
jeopardy_df['number_of_cwords'] = jeopardy_df.Clean_Clue.apply(lambda x: len(x.split()))
y = jeopardy_df['number_of_cwords']
# use Seaborn coding to generate box and whisker plot.
fig, ax = plt.subplots(figsize=(10,8))
sns.set_context("notebook", font_scale=1.5, rc={"lines.linewidth": 2.5})
sns.boxplot(data=y, width = 0.35, linewidth=2.5, palette='Blues',
showmeans=True, medianprops={'color':'red'}, meanprops={'marker':'o',
'markerfacecolor':'white', 'markeredgecolor':'black',
'markersize':'10'})
plt.title('Boxplot of Words per Clue Statement', fontsize=22, pad=20)
plt.xlabel('Jeopardy Clue Statements', fontsize=20, labelpad = 20)
plt.ylabel('Number of Words per Clue Statement', fontsize=20, labelpad = 20)
ax.set(xticklabels=[])
plt.show()
There are outlier clue statements with more than 26 words and perhaps a few with less than 4. I'm not sure if such outlier clues are worth keeping in the analysis. I will observe the nature of clue statements with many words and decide whether to remove them. I will also observe clues with three or less words and decide whether to exclude or include those.
# extend column width to display as much of 'Clue' statement as possible.
pd.set_option('display.max_colwidth', 110)
# determine maximum number of words in each 'Clue'.
column = jeopardy_df['number_of_cwords']
max_value = column.max()
max_index = column.idxmax()
print('\n')
print('Clue Statement with maximum number of words =', max_value, 'and corresponding index is', max_index)
display(Markdown('<h3>An Example Clue with 65 Words</h3>'))
print(jeopardy_df.loc[436, 'Clean_Clue'])
thirty_word_c = jeopardy_df[jeopardy_df['number_of_cwords']==30]
display(Markdown('<h3>Example Clues with 30 Words</h3>'))
display(thirty_word_c['Clean_Clue'].head(10))
display(Markdown('<h3>Example Clues with Only 1 Word</h3>'))
one_word_c = jeopardy_df[jeopardy_df['number_of_cwords']==1]
display(one_word_c.iloc[1:6, 3:10])
display(Markdown('<h3>Example Clues with Only 2 Words</h3>'))
two_word_c = jeopardy_df[jeopardy_df['number_of_cwords']==2]
display(two_word_c.iloc[1:6, 3:10])
display(Markdown('<h3>Example Clues with Only 3 Words</h3>'))
three_word_c = jeopardy_df[jeopardy_df['number_of_cwords']==3]
display(three_word_c.iloc[1:6, 3:10])
# create new dataframe name with various rows removed.
jeopardy2_df = jeopardy_df[jeopardy_df['number_of_cwords'] < 30]
print('There are', len(jeopardy2_df), 'remaining rows after removing clues with 30 words or more.')
# return column width to default.
pd.reset_option('display.max_colwidth')
Clue Statement with maximum number of words = 65 and corresponding index is 436
a hrefhttpwwwjarchivecommedia20101207j28jpg targetblankkelly of the clue crew reports from the rvmh hall of fame in elkhart ina one of the first times a movie star received a a hrefhttpwwwjarchivecommedia20101207j28ajpg targetblankfancy trailera as a perk was in 1931 paramount gave a a hrefhttpwwwjarchivecommedia20101207j28bjpg targetblankchevy house cara to a hrefhttpwwwjarchivecommedia20101207j28cjpg targetblank thisa sexy star as she left the stage to make movies like she done him wrong
1080 a hrefhttpwwwjarchivecommedia20020510j19wmvsofia of the clue crew jogs through central parka im running a... 1254 a hrefhttpwwwjarchivecommedia20060213j02wmvjon of the clue crew points out rockos blue collar pedigreea s... 2447 a hrefhttpwwwjarchivecommedia20070112j17jpg targetblankkelly of the clue crew reports from girls and boys ... 2474 a hrefhttpwwwjarchivecommedia20070112dj26jpg targetblankjames lipton reads the cluea the 19th centurymore ... 3011 a hrefhttpwwwjarchivecommedia20061012dj26jpg targetblankcheryl of the clue crew reads from near a fountain... 3282 a hrefhttpwwwjarchivecommedia20061019dj30wmvjon of the clue crew shows an animation on the monitora this ... 3727 sarah of the clue crew in the laboratory a must for any gardener a soil testing kit is used to take a rea... 3923 a hrefhttpwwwjarchivecommedia20091123dj21jpg targetblankalex reports from the western wall in jerusalema ... 4179 jimmy of the clue crew having a pass thrown to him by charlie batch of the pittsburgh steelers the name o... 5456 this 1946 musical that featured a hrefhttpwwwjarchivecommedia19941117j27mp3the followinga was irving berli... Name: Clean_Clue, dtype: object
Category | Value | Clue | Answer | Clean_Clue | number_of_cwords | |
---|---|---|---|---|---|---|
154 | IT'S OURS! | $400 | Montserrat | Great Britain (United Kingdom) | montserrat | 1 |
674 | ACTRESSES' FIRST FILMS | $400 | "Oklahoma!" | Shirley Jones | oklahoma | 1 |
680 | ACTRESSES' FIRST FILMS | $500 | "Halloween" | Jamie Lee Curtis | halloween | 1 |
1037 | AIRPORT CODES | $200 | ATL | Atlanta | atl | 1 |
1043 | AIRPORT CODES | $400 | BRU | Brussels | bru | 1 |
Category | Value | Clue | Answer | Clean_Clue | number_of_cwords | |
---|---|---|---|---|---|---|
85 | RHYMES WITH SMART | $1000 | Composer Wolfgang | Mozart | composer wolfgang | 2 |
148 | IT'S OURS! | $200 | Saint-Pierre & Miquelon | France | saintpierre miquelon | 2 |
160 | IT'S OURS! | $600 | Cook Islands | New Zealand | cook islands | 2 |
166 | IT'S OURS! | $800 | Madeira Islands | Portugal | madeira islands | 2 |
328 | ALBUMS THAT ROCK | $400 | "X&Y", "Parachutes" | Coldplay | xy parachutes | 2 |
Category | Value | Clue | Answer | Clean_Clue | number_of_cwords | |
---|---|---|---|---|---|---|
83 | BE FRUITFUL & MULTIPLY | $1000 | 2 x 1,035 | 2,070 | 2 x 1035 | 3 |
171 | IT'S OURS! | $1000 | Northern Mariana Islands | USA | northern mariana islands | 3 |
266 | NOT A CURRENT NATIONAL CAPITAL | $400 | Ljubljana, Bratislava, Barcelona | Barcelona | ljubljana bratislava barcelona | 3 |
272 | NOT A CURRENT NATIONAL CAPITAL | $800 | Istanbul, Ottawa, Amman | Istanbul | istanbul ottawa amman | 3 |
278 | NOT A CURRENT NATIONAL CAPITAL | $1200 | Sofia, Sarajevo, Saigon | Saigon | sofia sarajevo saigon | 3 |
There are 19627 remaining rows after removing clues with 30 words or more.
Based on the condition of clue statements with 30 words or more (messy, meaningless non-words, hyperlinks, etc.), I will remove all rows where clue statements have 30 words or more. That only reduces the total number of clue statements available to analyze by 1.8%. I really don't think removing these clues will negatively impact the analysis results.
The example clue statements shown above with only 1, 2 or 3 words look to be legitimate. It's the category subject (e.g. 'IT'S OURS' or 'NOT A CURRENT NATIONAL CAPITAL', ...) that makes it possible to have clue statements with so few words. Therefore I will keep all clue statements intact with less than 4 words.
# create a dataframe with two columns to generate two box and whisker plots.
df = pd.DataFrame(columns = ['Before', 'After'])
df['Before'] = jeopardy_df['number_of_cwords']
df['After'] = jeopardy2_df['number_of_cwords']
# use Seaborn coding to generate box and whisker plot.
fig, ax = plt.subplots(figsize=(10,8))
sns.set_context("notebook", font_scale=1.5, rc={"lines.linewidth": 2.5})
sns.boxplot(data=df, width = 0.35, linewidth=2.5, palette='Blues',
showmeans=True, medianprops={'color':'red'}, meanprops={'marker':'o',
'markerfacecolor':'white', 'markeredgecolor':'black',
'markersize':'10'})
plt.title('Boxplot of Words per Clue Statement', fontsize=22, pad=20)
plt.xlabel('Jeopardy Clue Statements', fontsize=20, labelpad = 20)
plt.ylabel('Number of Words per Clue Statement', fontsize=20, labelpad = 20)
plt.show()
# define function to remove punctuations from strings.
def remove_punctuations(text):
for punctuation in string.punctuation:
text = text.replace(punctuation, '')
return text
# remove all punctuation from Answer column, apply to the DF series
jeopardy2_df.loc[:,'Clean_Answer'] = jeopardy2_df.loc[:,' Answer'].apply(remove_punctuations)
display(Markdown('<h2>Punctuation Removed</h2>'))
display(jeopardy2_df.iloc[:5,[6,9]])
# make all characters lower case in 'Clean Answer' column.
display(Markdown('<h2>Make All Lower Case</h2>'))
jeopardy2_df.loc[:,'Clean_Answer'] = jeopardy2_df.loc[:,'Clean_Answer'].str.lower()
display(jeopardy2_df.iloc[:5,[6,9]])
jeopardy2_df.loc[:,'number_of_awords'] = jeopardy2_df.loc[:,'Clean_Answer'].apply(lambda x: len(x.split()))
column = jeopardy2_df['number_of_awords']
max_value = column.max()
max_index = column.idxmax()
print('\n')
print('Answer Statement with maximum number of words =', max_value, 'and corresponding index is', max_index, '\n')
display(Markdown('<h3>Clue Statement to Longest Answer</h3>'))
print(jeopardy2_df.loc[18007, 'Clean_Clue'], '\n')
display(Markdown('<h3>Longest Answer</h3>'))
print(jeopardy2_df.loc[18007, 'Clean_Answer'], '\n')
Answer | Clean_Answer | |
---|---|---|
0 | Copernicus | Copernicus |
1 | Jim Thorpe | Jim Thorpe |
2 | Arizona | Arizona |
3 | McDonald's | McDonalds |
4 | John Adams | John Adams |
Answer | Clean_Answer | |
---|---|---|
0 | Copernicus | copernicus |
1 | Jim Thorpe | jim thorpe |
2 | Arizona | arizona |
3 | McDonald's | mcdonalds |
4 | John Adams | john adams |
Answer Statement with maximum number of words = 21 and corresponding index is 18007
2 of the 6 current major league teams that have never won a league pennant
2 of the montreal expos the seattle mariners the california angels the texas rangers the toronto blue jays the houston astros
The longest 'Answer' statement looks legitimate. I will not remove and Answer statements.
print(jeopardy2_df[' Value'].value_counts(dropna=False))
def remove_punctuations(val):
for punctuation in string.punctuation:
val = val.replace(punctuation, '')
return val
# remove punctuations from 'Value' column.
jeopardy2_df['Clean_Value'] = jeopardy2_df[' Value'].apply(remove_punctuations)
display(Markdown('<h2>Punctuation Removed</h2>'))
display(jeopardy2_df.iloc[:5,[4,9]])
type = jeopardy2_df.dtypes['Clean_Value']
print(type)
jeopardy2_df['Clean_Value'] = jeopardy2_df['Clean_Value'].replace(np.nan, 0)
jeopardy2_df['Clean_Value'] = pd.to_numeric(jeopardy2_df['Clean_Value'], errors='coerce').fillna(0)
jeopardy2_df['Clean_Value'] = jeopardy2_df['Clean_Value'].astype(int)
display(Markdown('<h2>Clean Value in Finished State</h2>'))
display(jeopardy2_df.iloc[:5,[4,11]])
jeopardy2_df[' Air Date'] = pd.to_datetime(jeopardy2_df[' Air Date'])
print(jeopardy2_df.info())
$400 3837 $800 2915 $200 2757 $600 1856 $1000 1760 ... $2,300 1 $7,400 1 $4,500 1 $6,100 1 $1,020 1 Name: Value, Length: 76, dtype: int64
Value | Clean_Answer | |
---|---|---|
0 | $200 | copernicus |
1 | $200 | jim thorpe |
2 | $200 | arizona |
3 | $200 | mcdonalds |
4 | $200 | john adams |
object
Value | Clean_Value | |
---|---|---|
0 | $200 | 200 |
1 | $200 | 200 |
2 | $200 | 200 |
3 | $200 | 200 |
4 | $200 | 200 |
<class 'pandas.core.frame.DataFrame'> Int64Index: 19627 entries, 0 to 19998 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Show Number 19627 non-null int64 1 Air Date 19627 non-null datetime64[ns] 2 Round 19627 non-null object 3 Category 19627 non-null object 4 Value 19627 non-null object 5 Clue 19627 non-null object 6 Answer 19627 non-null object 7 Clean_Clue 19627 non-null object 8 number_of_cwords 19627 non-null int64 9 Clean_Answer 19627 non-null object 10 number_of_awords 19627 non-null int64 11 Clean_Value 19627 non-null int32 dtypes: datetime64[ns](1), int32(1), int64(3), object(7) memory usage: 2.5+ MB None
def count_matches(row):
split_answer = row["Clean_Answer"].split()
split_question = row["Clean_Clue"].split()
if "the" in split_answer:
split_answer.remove("the")
if len(split_answer) == 0:
return 0
match_count = 0
for item in split_answer:
if item in split_question:
match_count += 1
return match_count / len(split_answer)
jeopardy2_df["answer_in_clue"] = jeopardy2_df.apply(count_matches, axis=1)
print(jeopardy2_df["answer_in_clue"].mean())
0.057932673803668025
# sort dataframe by ascending 'Air Date'.
jeopardy2_df.sort_values(by=[' Air Date'])
display(jeopardy2_df.iloc[:5,1:7])
display(jeopardy2_df.iloc[-5:,1:7])
# create an empty list and an empty set.
question_overlap = []
terms_used = set()
for i, row in jeopardy2_df.iterrows():
split_question = row["Clean_Clue"].split(" ")
split_question = [q for q in split_question if len(q) > 5]
match_count = 0
for word in split_question:
if word in terms_used:
match_count += 1
for word in split_question:
terms_used.add(word)
if len(split_question) > 0:
match_count /= len(split_question)
question_overlap.append(match_count)
jeopardy2_df["question_overlap"] = question_overlap
jeopardy2_df["question_overlap"].mean()
Air Date | Round | Category | Value | Clue | Answer | |
---|---|---|---|---|---|---|
0 | 2004-12-31 | Jeopardy! | HISTORY | $200 | For the last 8 years of his life, Galileo was ... | Copernicus |
1 | 2004-12-31 | Jeopardy! | ESPN's TOP 10 ALL-TIME ATHLETES | $200 | No. 2: 1912 Olympian; football star at Carlisl... | Jim Thorpe |
2 | 2004-12-31 | Jeopardy! | EVERYBODY TALKS ABOUT IT... | $200 | The city of Yuma in this state has a record av... | Arizona |
3 | 2004-12-31 | Jeopardy! | THE COMPANY LINE | $200 | In 1963, live on "The Art Linkletter Show", th... | McDonald's |
4 | 2004-12-31 | Jeopardy! | EPITAPHS & TRIBUTES | $200 | Signer of the Dec. of Indep., framer of the Co... | John Adams |
Air Date | Round | Category | Value | Clue | Answer | |
---|---|---|---|---|---|---|
19994 | 2000-03-14 | Jeopardy! | U.S. GEOGRAPHY | $200 | Of 8, 12 or 18, the number of U.S. states that... | 18 |
19995 | 2000-03-14 | Jeopardy! | POP MUSIC PAIRINGS | $200 | ...& the New Power Generation | Prince |
19996 | 2000-03-14 | Jeopardy! | HISTORIC PEOPLE | $200 | In 1589 he was appointed professor of mathemat... | Galileo |
19997 | 2000-03-14 | Jeopardy! | 1998 QUOTATIONS | $200 | Before the grand jury she said, "I'm really so... | Monica Lewinsky |
19998 | 2000-03-14 | Jeopardy! | LLAMA-RAMA | $200 | Llamas are the heftiest South American members... | Camels |
0.6872597590868267
def determine_value(row):
value = 0
if row["Clean_Value"] > 800:
value = 1
return value
jeopardy2_df["High_Value"] = jeopardy2_df.apply(determine_value, axis=1)
def count_usage(term):
low_count = 0
high_count = 0
for i, row in jeopardy2_df.iterrows():
if term in row["Clean_Clue"].split(" "):
if row["High_Value"] == 1:
high_count += 1
else:
low_count += 1
return high_count, low_count
from random import choice
terms_used_list = list(terms_used)
comparison_terms = [choice(terms_used_list) for _ in range(10)]
observed_expected = []
for term in comparison_terms:
observed_expected.append(count_usage(term))
observed_expected
[(0, 1), (0, 1), (0, 1), (1, 0), (1, 2), (1, 3), (0, 1), (0, 1), (0, 1), (1, 0)]
from scipy.stats import chisquare
import numpy as np
high_value_count = jeopardy2_df[jeopardy2_df["High_Value"] == 1].shape[0]
low_value_count = jeopardy2_df[jeopardy2_df["High_Value"] == 0].shape[0]
chi_squared = []
for obs in observed_expected:
total = sum(obs)
total_prop = total / jeopardy2_df.shape[0]
high_value_exp = total_prop * high_value_count
low_value_exp = total_prop * low_value_count
observed = np.array([obs[0], obs[1]])
expected = np.array([high_value_exp, low_value_exp])
chi_squared.append(chisquare(observed, expected))
chi_squared
[Power_divergenceResult(statistic=0.3947555429221148, pvalue=0.5298102186623133), Power_divergenceResult(statistic=0.3947555429221148, pvalue=0.5298102186623133), Power_divergenceResult(statistic=0.3947555429221148, pvalue=0.5298102186623133), Power_divergenceResult(statistic=2.533213321332134, pvalue=0.11147300243088427), Power_divergenceResult(statistic=0.037411831006864155, pvalue=0.8466289712116682), Power_divergenceResult(statistic=0.021503301907791657, pvalue=0.8834161463867766), Power_divergenceResult(statistic=0.3947555429221148, pvalue=0.5298102186623133), Power_divergenceResult(statistic=0.3947555429221148, pvalue=0.5298102186623133), Power_divergenceResult(statistic=0.3947555429221148, pvalue=0.5298102186623133), Power_divergenceResult(statistic=2.533213321332134, pvalue=0.11147300243088427)]
None of the calculated p-values are less than 0.05. Therefore there is no signficant dfifference between observed and expected values.
I can't really say there are any significant patterns observed in the questions that could help me win as a potential future contestant. I'm sure there is more analysis I could do, however I will leave that and move on with other coding training.