Many people start playing the lottery for fun, but for some this activity turns into a habit which eventually escalates into addiction. Like other compulsive gamblers, lottery addicts soon begin spending from their savings and loans, they start to accumulate debts, and eventually engage in desperate behaviors like theft.
A medical institute that aims to prevent and treat gambling addictions wants to build a dedicated mobile app to help lottery addicts better estimate their chances of winning. The institute has a team of engineers that will build the app, but they need us to create the logical core of the app and calculate probabilities.
For the first version of the app, they want us to focus on the 6/49 lottery and build functions that enable users to answer questions like:
The institute also wants us to consider historical data coming from the national 6/49 lottery game in Canada. The data set has data for 3,665 drawings, dating from 1982 to 2018.
The scenario we're following throughout this project is fictional — the main purpose is to practice applying probability and combinatorics (permutations and combinations) concepts in a setting that simulates a real-world scenario.
Throughout the project, we'll need to calculate repeatedly probabilities and combinations. As a consequence, we'll start by writing two functions that we'll use often:
def factorial(n):
res = n
# Base case: 0! = 1
if n == 0:
return 1
# Recursive case: n! = n * (n-1)!
else:
return res * factorial(n-1)
def combinations(n,k):
permutations = factorial(n)/factorial(n-k)
combinations = permutations / factorial(k)
return combinations
We need to write a function, which takes a six unique numbers and prints the probability of winning the big prize.
The engineer team told us that we need to be aware of the following details when we write the function:
We are going to write an interactive function to make sure that the user inputs six different numbers from 1 to 49.
We will use the 'try/exception'
block in combination with the 'while loop'
to allow users to try multiple times until the input satisfies the condition and to make sure that this input does not lead to modes of failure such as when the user enters a non-integer number.
The function will print messages with respect to what the user inputs and it will serve as an input to the function 'one_ticket_probability'
.
def check_validity():
print("Please enter your 6 ticket numbers: ")
print("*********************************") # output delimiter
numbers=[]
while len(numbers) < 6:
try:
number=input('Enter ticket number {}: '.format(len(numbers)+1))
print("*********************************")
if int(number) in range(1,50) and int(number) not in numbers:
numbers.append(int(number))
else:
if int(number) not in range(1,50):
print("The number must be in the range from 1 to 49.")
print("*********************************")
else:
print("The number exists already.")
print("*********************************")
except :
print("The input is not valid.")
print("*********************************")
return numbers
check_validity()
Please enter your 6 ticket numbers: ********************************* Enter ticket number 1: 1 ********************************* Enter ticket number 2: 2 ********************************* Enter ticket number 3: 3 ********************************* Enter ticket number 4: 4 ********************************* Enter ticket number 5: 5 ********************************* Enter ticket number 6: 6 *********************************
[1, 2, 3, 4, 5, 6]
Below, we write the one_ticket_probability
function, which takes the output of the function check_validity
as an input and prints the probability of winning in percentage.
def one_ticket_probability():
ticket = check_validity()
possible_outcomes = combinations(49,6)
successful_outcomes = 1
chances = successful_outcomes * 100 / possible_outcomes
print("Your chances to win the big prize is {:.8f}%. In other words, you have a 1 in 13,983,816 chances to win.".format(chances))
print("*********************************")
return ticket,chances
one_ticket_probability()
Please enter your 6 ticket numbers: ********************************* Enter ticket number 1: 10 ********************************* Enter ticket number 2: 11 ********************************* Enter ticket number 3: 12 ********************************* Enter ticket number 4: 15 ********************************* Enter ticket number 5: 45 ********************************* Enter ticket number 6: 49 ********************************* Your chances to win the big prize is 0.00000715%. In other words, you have a 1 in 13,983,816 chances to win. *********************************
([10, 11, 12, 15, 45, 49], 7.151123842018516e-06)
import pandas as pd
import numpy as np
data = pd.read_csv('649.csv')
print('The data set contains {} rows and {} columns.'.format(data.shape[0],data.shape[1]))
The data set contains 3665 rows and 11 columns.
data.head(5)
PRODUCT | DRAW NUMBER | SEQUENCE NUMBER | DRAW DATE | NUMBER DRAWN 1 | NUMBER DRAWN 2 | NUMBER DRAWN 3 | NUMBER DRAWN 4 | NUMBER DRAWN 5 | NUMBER DRAWN 6 | BONUS NUMBER | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 649 | 1 | 0 | 6/12/1982 | 3 | 11 | 12 | 14 | 41 | 43 | 13 |
1 | 649 | 2 | 0 | 6/19/1982 | 8 | 33 | 36 | 37 | 39 | 41 | 9 |
2 | 649 | 3 | 0 | 6/26/1982 | 1 | 6 | 23 | 24 | 27 | 39 | 34 |
3 | 649 | 4 | 0 | 7/3/1982 | 3 | 9 | 10 | 13 | 20 | 43 | 34 |
4 | 649 | 5 | 0 | 7/10/1982 | 5 | 14 | 21 | 31 | 34 | 47 | 45 |
data.tail(5)
PRODUCT | DRAW NUMBER | SEQUENCE NUMBER | DRAW DATE | NUMBER DRAWN 1 | NUMBER DRAWN 2 | NUMBER DRAWN 3 | NUMBER DRAWN 4 | NUMBER DRAWN 5 | NUMBER DRAWN 6 | BONUS NUMBER | |
---|---|---|---|---|---|---|---|---|---|---|---|
3660 | 649 | 3587 | 0 | 6/6/2018 | 10 | 15 | 23 | 38 | 40 | 41 | 35 |
3661 | 649 | 3588 | 0 | 6/9/2018 | 19 | 25 | 31 | 36 | 46 | 47 | 26 |
3662 | 649 | 3589 | 0 | 6/13/2018 | 6 | 22 | 24 | 31 | 32 | 34 | 16 |
3663 | 649 | 3590 | 0 | 6/16/2018 | 2 | 15 | 21 | 31 | 38 | 49 | 8 |
3664 | 649 | 3591 | 0 | 6/20/2018 | 14 | 24 | 31 | 35 | 37 | 48 | 17 |
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 3665 entries, 0 to 3664 Data columns (total 11 columns): PRODUCT 3665 non-null int64 DRAW NUMBER 3665 non-null int64 SEQUENCE NUMBER 3665 non-null int64 DRAW DATE 3665 non-null object NUMBER DRAWN 1 3665 non-null int64 NUMBER DRAWN 2 3665 non-null int64 NUMBER DRAWN 3 3665 non-null int64 NUMBER DRAWN 4 3665 non-null int64 NUMBER DRAWN 5 3665 non-null int64 NUMBER DRAWN 6 3665 non-null int64 BONUS NUMBER 3665 non-null int64 dtypes: int64(10), object(1) memory usage: 315.0+ KB
We're going to write a function that will enable users to compare their ticket against the historical lottery data in Canada and determine whether they would have ever won by now.
The engineering team told us that we need to be aware of the following details:
The engineering team wants us to write a function that prints:
We need first to extract all the winning six numbers from the historical data set, for that purpose we are going to write a function named extract_numbers
that takes as input a row of the lottery dataframe and returns a set containing all the six winning numbers then we will use extract_numbers
in combination with the DataFrame.apply()
method to extract all the winning numbers.
def extract_numbers(row):
winning_six = set()
for i in range(4,10):
winning_six.add(row.iloc[i])
return winning_six
data['winning_six'] = data.apply(extract_numbers,axis=1)
data.winning_six.head()
0 {3, 41, 11, 12, 43, 14} 1 {33, 36, 37, 39, 8, 41} 2 {1, 6, 39, 23, 24, 27} 3 {3, 9, 10, 43, 13, 20} 4 {34, 5, 14, 47, 21, 31} Name: winning_six, dtype: object
Below, we write the check_historical_occurrence
function that takes in the output of the function one_ticket_probability
(wich contains already the user numbers) and the historical numbers an inputs and prints information with respect to the number of occurrences and the probability of winning in the next drawing.
def check_historical_occurence(winning_six=data.winning_six):
ticket=one_ticket_probability()
numbers = set(ticket[0])
matches = data.winning_six == numbers
print("The combination {} has occurred {} time(s) previously.".format(numbers,matches.sum()))
print("*********************************")
if matches.sum()==0:
print("That combination has never occured. This doesn't mean it's more likely to occur now.\nYour chances to win the big prize in the next drawing are still 0.0000072%.")
else:
print('Your chances to win the big prize in the next drawing with that combination are still {:.8f} %.'.format(ticket[1]))
print("*********************************")
check_historical_occurence()
Please enter your 6 ticket numbers: ********************************* Enter ticket number 1: 34 ********************************* Enter ticket number 2: 5 ********************************* Enter ticket number 3: 14 ********************************* Enter ticket number 4: 47 ********************************* Enter ticket number 5: 21 ********************************* Enter ticket number 6: 31 ********************************* Your chances to win the big prize is 0.00000715%. In other words, you have a 1 in 13,983,816 chances to win. ********************************* The combination {34, 5, 14, 47, 21, 31} has occurred 1 time(s) previously. ********************************* Your chances to win the big prize in the next drawing with that combination are still 0.00000715 %. *********************************
Lottery addicts usually play more than one ticket on a single drawing, thinking that this might increase their chances of winning significantly. Our purpose is to help them better estimate their chances of winning so we're going to write a function that will allow the users to calculate the chances of winning for any number of different tickets.
We've talked with the engineering team and they gave us the following information:
We are going to write a function named multi_ticket_probability
that prints the probability of winning the big prize depending on the number of different tickets played.
def multi_ticket_probability():
possible_outcomes = combinations(49,6)
while True:
n=input('How many different tickets are you going to play: ')
print('*********************************')
try:
if int(n) in range(1,13983817):
chances = int(n)*100 / possible_outcomes
print("Your chances to win the big prize by playing {} ticket(s) are {:.10f} %.".format(n,chances))
print('*********************************')
break
else:
print('Please enter a valid and reasonable number of tickets.')
print('*********************************')
except:
print('Invalid number of tickets')
print('*********************************')
multi_ticket_probability()
How many different tickets are you going to play: 910 ********************************* Your chances to win the big prize by playing 910 ticket(s) are 0.0065075227 %. *********************************
In most 6/49
lotteries there are smaller prizes if a player's ticket match two, three, four, or five of the six numbers drawn. As a consequence, the users might be interested in knowing the probability of having two, three, four, or five winning numbers.
These are the engineering details we'll need to be aware of:
def probability_less_6():
print("Calculate the probability of having two, three, four or five winning numbers.")
print('*********************************')
while True:
n = input('Enter a number between 2 and 5: ')
print('*********************************')
try :
if int(n) in [2,3,4,5]:
combination_ticket = combinations(6,int(n))
successful_outcomes = combination_ticket * combinations(49-6,6-int(n))
possible_outcomes = combinations(49,6)
proba_less_6 = successful_outcomes*100/possible_outcomes
print('Your chances to have exactly {} winning numbers are {} %'.format(n,'{:.10f}'.format(proba_less_6)))
print('*********************************')
break
else:
print('This number is out of range.')
print('*********************************')
except :
print('Invalid input')
print('*********************************')
probability_less_6()
Calculate the probability of having two, three, four or five winning numbers. ********************************* Enter a number between 2 and 5: 910 ********************************* This number is out of range. ********************************* Enter a number between 2 and 5: 5 ********************************* Your chance to have exactly 5 winning numbers is 0.0018449900 % *********************************
We will create a function similar to probability_less_6
which calculates the probability of having at least
two, three, four or five winning numbers. For instance the the probability of having at least four winning numbers is the sum of these three probabilities:
def probability_at_least():
print("Calculate the probability of at least two, three, four or five winning numbers.")
print('*********************************')
while True:
n = input('Enter a number between 2 and 5: ')
print('*********************************')
try :
if int(n) in [2,3,4,5]:
possible_outcomes = combinations(49,6)
chances_at_least=[1/possible_outcomes]
for i in range(int(n),1,-1):
ticket_combinations = combinations(6,i)
successful_outcomes = ticket_combinations * combinations(43,6-i)
at_least = successful_outcomes *100 / possible_outcomes
chances_at_least.append(at_least)
print('Your chances to have at least {} winning numbers are {} %'.format(n,sum(chances_at_least)))
print('*********************************')
break
else:
print('This number is out of range.')
print('*********************************')
except :
print('Invalid number')
print('*********************************')
probability_at_least()
Calculate the probability of at least two, three, four or five winning numbers. ********************************* Enter a number between 2 and 5: 44 ********************************* This number is out of range. ********************************* Enter a number between 2 and 5: 0 ********************************* This number is out of range. ********************************* Enter a number between 2 and 5: 5 ********************************* Your chance to have exactly 5 winning numbers is 15.101550320742207 % *********************************
For the first version of the app, we coded four main functions using interactive inputs: