Lotto 6/49 is a national lottery game in Canada. In this weekly lottery, as the name implies, 6 numbers are drawn from a set of 49. If you have a ticket with exactly these 6 numbers, you win a big prize. If you have not all 6, but some numbers (5, 4, 3, 2) correct, then you also still win a (smaller) prize.
As in any gambling game, addiction is a risk, while chances that you do win indeed, are not exactly high.
Now, I am pretending to work for for an institute that wants to help (potential) gambling addicts. One thing the institute wants to do is create an app that shows the following chances:
In addition, for any ticket (combination of 6 numbers) the app should be able to show how many times it would have won the big prize during 3,665 drawings in the past (dating from 1982 tot 2018). For this, the institute has historic data available.
I have been tasked to created the 'heart' of the app: create functions for each of these scenarios that take the required inputs and returns the desired outputs.
I'll start with defining 2 helper functions that we'll need for the calculations.
# Define a function that takes input n and returns factorial(n)
def factorial(n):
outcome = 1
for counter in range(1,n+1):
outcome = outcome * counter
counter = counter + 1
return outcome
# Define a function that takes inputs n, k and returns the number of combinations when taking k objects from a group of n objects
def combinations(n,k):
numerator = factorial(n)
denominator = factorial(k)*factorial(n-k)
return (numerator/denominator)
Function input: a list of 6 numbers representing one lottery ticket.
Function output: a printed text including the chance to win the big prize with this ticket by having all 6 numbers correct.
# Define a function that calculates the chance of winning the big prize
def one_ticket_probability(list_of_6):
nr_of_combs = combinations(49,6)
chance_to_win = 1/nr_of_combs
output = "The chance to win the big prize with combination {} is {:.10%}.".format(list_of_6, chance_to_win)
output = output + "\nOr, said differently, 1 out of {:,} times.".format(int(nr_of_combs))
print (output)
# Test it with some list of 6 numbers
series = [1, 2, 3, 7, 12, 16]
chance_big_price = one_ticket_probability(series)
The chance to win the big prize with combination [1, 2, 3, 7, 12, 16] is 0.0000071511%. Or, said differently, 1 out of 13,983,816 times.
# Test it with another list of 6 numbers
series = [41, 42, 3, 47, 18, 16]
one_ticket_probability(series)
The chance to win the big prize with combination [41, 42, 3, 47, 18, 16] is 0.0000071511%. Or, said differently, 1 out of 13,983,816 times.
The app user will be able to see that for his set of 6 numbers the chance of winning the big prize is extremely slim.
As mentioned in the introduction, the institute can use a dataset with historic data that is available on Kaggle so that an app user can see for any ticket how many times such ticket (that is: this combination of 6 numbers) would have won him the big prize.
I have downloaded the dataset and saved it in the same folder as this notebook. I will start with exploring the dataset a bit, then continue with writing the function.
Function input: a list of 6 numbers representing one lottery ticket.
Function output: a printed text show how many times this combination won in the past (1982-2018), and also including the chance to win the big prize with this ticket.
# Import pandas
import pandas as pd
# Import the dataset, store as dataframe
history = pd.read_csv('649.csv')
# Print number of rows and columns
history.shape
(3665, 11)
# Show the first 3 rows
history.head(3)
PRODUCT | DRAW NUMBER | SEQUENCE NUMBER | DRAW DATE | NUMBER DRAWN 1 | NUMBER DRAWN 2 | NUMBER DRAWN 3 | NUMBER DRAWN 4 | NUMBER DRAWN 5 | NUMBER DRAWN 6 | BONUS NUMBER | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 649 | 1 | 0 | 6/12/1982 | 3 | 11 | 12 | 14 | 41 | 43 | 13 |
1 | 649 | 2 | 0 | 6/19/1982 | 8 | 33 | 36 | 37 | 39 | 41 | 9 |
2 | 649 | 3 | 0 | 6/26/1982 | 1 | 6 | 23 | 24 | 27 | 39 | 34 |
# Show the last 3 rows
history.tail(3)
PRODUCT | DRAW NUMBER | SEQUENCE NUMBER | DRAW DATE | NUMBER DRAWN 1 | NUMBER DRAWN 2 | NUMBER DRAWN 3 | NUMBER DRAWN 4 | NUMBER DRAWN 5 | NUMBER DRAWN 6 | BONUS NUMBER | |
---|---|---|---|---|---|---|---|---|---|---|---|
3662 | 649 | 3589 | 0 | 6/13/2018 | 6 | 22 | 24 | 31 | 32 | 34 | 16 |
3663 | 649 | 3590 | 0 | 6/16/2018 | 2 | 15 | 21 | 31 | 38 | 49 | 8 |
3664 | 649 | 3591 | 0 | 6/20/2018 | 14 | 24 | 31 | 35 | 37 | 48 | 17 |
# Show relevant information about all columns
history.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 3665 entries, 0 to 3664 Data columns (total 11 columns): PRODUCT 3665 non-null int64 DRAW NUMBER 3665 non-null int64 SEQUENCE NUMBER 3665 non-null int64 DRAW DATE 3665 non-null object NUMBER DRAWN 1 3665 non-null int64 NUMBER DRAWN 2 3665 non-null int64 NUMBER DRAWN 3 3665 non-null int64 NUMBER DRAWN 4 3665 non-null int64 NUMBER DRAWN 5 3665 non-null int64 NUMBER DRAWN 6 3665 non-null int64 BONUS NUMBER 3665 non-null int64 dtypes: int64(10), object(1) memory usage: 315.1+ KB
What we can see is that we have a set of clean-looking data of 3,665 historic drawings between June 1982 and June 2018. Let's proceed, first with a helper function that collects all winning combinations from the past, then with the actual function.
For this part we will make use of Python 'sets', as that is most efficient (best performing) for the comparisons that need to be done.
# Specify the relevant columns in a list
columns_numbers = history.columns[history.columns.str.startswith('NUMBER DRAWN')].tolist()
# Initiate a list that will contain the sets of winning numbers
historic_sets=[]
# Define a function that take one dataframe row as input, then appends its 6 numbers as a set to the historic-sets list
def append_numbers(a_row):
numbers_as_series = a_row[columns_numbers].astype(int)
numbers_as_set = set(numbers_as_series)
historic_sets.append(numbers_as_set)
# Apply to the history dataframe
history.apply(append_numbers, axis = 1)
0 None 1 None 2 None 3 None 4 None ... 3660 None 3661 None 3662 None 3663 None 3664 None Length: 3665, dtype: object
# Print the first 3 historic sets to validate
historic_sets[0:3]
[{3, 11, 12, 14, 41, 43}, {8, 33, 36, 37, 39, 41}, {1, 6, 23, 24, 27, 39}]
That's the same as the first 3 rows from the dataframe. Let's proceed with the requested function.
# Define a function taking (1) a list with 6 numbers (2) all historic sets of 6 numbers,
# then prints how many times the list occurred in history,
# and also prints the chance of winning with these 6 numbers
def check_historical_occurrence(list_of_6, historic_sets_of_6):
set_of_6 = set(list_of_6)
occurrences_total = len(historic_sets_of_6)
occurrences = 0
for a_set in historic_sets_of_6[0:3]:
if a_set == set_of_6:
print
occurrences = occurrences + 1
output = "During {} drawings (1982-2018), combination {} won the big prize {} time(s).".format(occurrences_total, list_of_6, occurrences)
print(output)
# Also print the chance of winning with this combination
#print('\n')
one_ticket_probability(list_of_6)
# Test with the first historic set
test_1 = [3, 11, 12, 14, 41, 43]
check_historical_occurrence(test_1, historic_sets)
During 3665 drawings (1982-2018), combination [3, 11, 12, 14, 41, 43] won the big prize 1 time(s). The chance to win the big prize with combination [3, 11, 12, 14, 41, 43] is 0.0000071511%. Or, said differently, 1 out of 13,983,816 times.
# Test with the first historic set, putting the numbers in a different sequence (should not make a difference)
test_2 = [11, 12, 3, 43, 41, 14]
check_historical_occurrence(test_2, historic_sets)
During 3665 drawings (1982-2018), combination [11, 12, 3, 43, 41, 14] won the big prize 1 time(s). The chance to win the big prize with combination [11, 12, 3, 43, 41, 14] is 0.0000071511%. Or, said differently, 1 out of 13,983,816 times.
# Test with another randomly choosen set
test_3 = [42, 31, 8, 43, 17, 4]
check_historical_occurrence(test_3, historic_sets)
During 3665 drawings (1982-2018), combination [42, 31, 8, 43, 17, 4] won the big prize 0 time(s). The chance to win the big prize with combination [42, 31, 8, 43, 17, 4] is 0.0000071511%. Or, said differently, 1 out of 13,983,816 times.
The app user will most likely see that his combination of 6 numbers never won the big prize over the course of 36 years. (Unless he deliberately choose 6 numbers that won in the past.)
And either way, he will see that for his set of 6 numbers the chance of winning the big prize is extremely slim.
Function input: an amount of lottery tickets.
Function output: a printed text including the chance to win the big prize with one of these tickets.
Assumption is that all tickets are different from each other.
# Define a function that takes an amount of (different) tickets as input, and calculates the chance of winning the big prize
def multi_ticket_probabibility(amount_tickets):
nr_of_combs = combinations(49,6)
chance_to_win = amount_tickets/nr_of_combs
output = "The chance to win the big prize with {} ticket(s) is {:.10%}.".format(amount_tickets, chance_to_win)
output = output + "\nOr, said differently, 1 out of {:,} times.".format(int(nr_of_combs/amount_tickets))
print (output)
# Show the result for several amounts of tickets
ticket_amounts = [1, 10, 100, 10000, 1000000, 6991908, 13983816]
for amount in ticket_amounts:
multi_ticket_probabibility(amount)
print('\n')
The chance to win the big prize with 1 ticket(s) is 0.0000071511%. Or, said differently, 1 out of 13,983,816 times. The chance to win the big prize with 10 ticket(s) is 0.0000715112%. Or, said differently, 1 out of 1,398,381 times. The chance to win the big prize with 100 ticket(s) is 0.0007151124%. Or, said differently, 1 out of 139,838 times. The chance to win the big prize with 10000 ticket(s) is 0.0715112384%. Or, said differently, 1 out of 1,398 times. The chance to win the big prize with 1000000 ticket(s) is 7.1511238420%. Or, said differently, 1 out of 13 times. The chance to win the big prize with 6991908 ticket(s) is 50.0000000000%. Or, said differently, 1 out of 2 times. The chance to win the big prize with 13983816 ticket(s) is 100.0000000000%. Or, said differently, 1 out of 1 times.
The app user will see that even if he will buy many tickets, chances of winning the big prize are still very slim. Even if you buy 10,000 tickets (which costs serious money), chance to win the big prize is still below 0.1%. And one needs to buy 7 million tickets to reach a 50% chance of winning the big prize!
Now give output for chances of winning a smaller prize by having at least some numbers correct.
Function input: an amount of numbers (1-6).
Function output: chance of having that amount of numbers correct on any lottery ticket.
It will calculate the amount of having exactly that amount correct (as opposed to 'at least' that amount).
# Define a function that takes an amount (2, 3, 4, 5) as an input, and calculates the chance of winning a prize by having that amount of numbers correct
def probabibility_less_6(amount):
# Calculate how many 'amount-number' (2-number, 3-number,...) combinations can be formed from any set of 6 numbers (that is: any ticket)
amount_number_combs = combinations(6, amount)
# Calculate how many successful 6-number combinations (=lottery outcomes) exist for each of those
# The remaining 'big prize' numbers must come from a set of 43 numbers
# For 5: 1 remaining
# For 4: 2 remaining etc
remaining = combinations(43, 6-amount)
successful_outcomes = amount_number_combs * remaining
# Calculate the total number of possible outcomes
nr_of_combs = combinations(49,6)
chance_to_win = successful_outcomes/nr_of_combs
# For validation/debugging (now commented out)
# print (amount_number_combs, remaining, successful_outcomes,chance_to_win)
# Print output
output = "The chance to win a prize for having {} numbers correct is {:.10%}.".format(amount, chance_to_win)
output = output + "\nOr, said differently, 1 out of {:.1f} times.".format(nr_of_combs/successful_outcomes)
print (output)
# Show the result for several amounts of numbers
amount_set = [2,3,4,5,6]
for amount in amount_set:
probabibility_less_6(amount)
print('\n')
The chance to win a prize for having 2 numbers correct is 13.2378029002%. Or, said differently, 1 out of 7.6 times. The chance to win a prize for having 3 numbers correct is 1.7650403867%. Or, said differently, 1 out of 56.7 times. The chance to win a prize for having 4 numbers correct is 0.0968619724%. Or, said differently, 1 out of 1032.4 times. The chance to win a prize for having 5 numbers correct is 0.0018449900%. Or, said differently, 1 out of 54200.8 times. The chance to win a prize for having 6 numbers correct is 0.0000071511%. Or, said differently, 1 out of 13983816.0 times.
The app user will see that his chance to win even the smallest prize, which you win with 2 correct numbers (that prize being something like 'another play for free' or a tiny amount of money) is still less than 1 out of 7, and if he checks for any amount higher than 2 correct numbers, he will see that he has a very small chance to win.
We have created 3 functions that for a given input returns a text showing the chance of winning (something) in the 6/49 Lottery. And by testing these functions, we have seen that these chances are very, very slim.
To further extend the app, as next steps we could bring these small numbers more to life. E.g: