It is pretty common, we have some extra change and we don't know what to do with it and so we decide to try our chance at the lottery. After all we've seen the big winners on TV, people who became millionaires from nothing just by selecting the right numbers on their tickets. It's supposed to be a one time thing, we lost at the first try but we are determinded to hit the jackpot and so we try again, again, and again. It has become a habit, before we know it, we can't stop ourselves from playing, it's now become an addiction and each day we keep pouring more and more money into it. The big question is, are we still going to play if we knew our chances of winning?
In this project we are going to be simulating a real world scenario to answer the following questions.
The dataset we will be working with is a kaggle dataset on popular Canadian lotto 6/49. Lotto 6/49 is one of three national lottery games in Canada. Launched on June 12, 1982, Lotto 6/49 was the first nationwide Canadian lottery game to allow players to choose their own numbers. Previous national games, such as the Olympic Lottery, Loto Canada and Superloto used pre-printed numbers on tickets. Lotto 6/49 led to the gradual phase-out of that type of lottery game in Canada.
Winning numbers are drawn by the Interprovincial Lottery Corporation every Wednesday and Saturday, executed with a Smartplay Halogen II ball machine.
It isn't worth it playing the lottery. The chances of having a winning ticket or even at least 2 winning numbers on a ticket are so small.
we are going to be creating 3 functions.
def factorial(n):
## calculates the factorial of any number n
factorial = 1
for i in range(n, 0, -1):
factorial *= i
return factorial
def combination(n, k):
## returns the combination of two sets of numbers n and k
numerator = factorial(n)
denominator = factorial(k) * factorial(n - k)
return int(numerator/denominator)
#testing our combination function
combination(5, 3)
10
def one_ticket_probability(ticket):
## calculates the probability of winning the lottery for any ticket
c = combination(49, 6)
outcome = 1
probability = outcome/c
probability_percentage = probability * 100 #turns the probability to a percentage
print(
'''You have a 1 in {:,} or {:.7f}% chance of winning with {}.
'''.format(c, probability_percentage, ticket)
)
#Testing the function
ticket1 = [11, 2, 4, 5, 6, 10]
ticket2 = [4, 11, 12, 1, 13, 3]
one_ticket_probability(ticket1)
print('\n')
one_ticket_probability(ticket2)
You have a 1 in 13,983,816 or 0.0000072% chance of winning with [11, 2, 4, 5, 6, 10]. You have a 1 in 13,983,816 or 0.0000072% chance of winning with [4, 11, 12, 1, 13, 3].
To add some context to what this function does , there are 13,983,816 possible outcomes if you were to pick 6 numbers out of 49 numbers. Since only one combination of the number wins the lottery, the number of winning outcome is just one. So to get the probability of winning the lottery, we divide the number of wining outcomes by the number of possible out comes. That's why we divided 1 (the number of winning outcome) by 13,983,816 (the number of possible outcomes).
import pandas as pd
lotto_649 = pd.read_csv('649.csv')
lotto_649.shape #shows the number of rows and columns of the dataset
(3665, 11)
lotto_649.head() # displays the first 5 rows
PRODUCT | DRAW NUMBER | SEQUENCE NUMBER | DRAW DATE | NUMBER DRAWN 1 | NUMBER DRAWN 2 | NUMBER DRAWN 3 | NUMBER DRAWN 4 | NUMBER DRAWN 5 | NUMBER DRAWN 6 | BONUS NUMBER | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 649 | 1 | 0 | 6/12/1982 | 3 | 11 | 12 | 14 | 41 | 43 | 13 |
1 | 649 | 2 | 0 | 6/19/1982 | 8 | 33 | 36 | 37 | 39 | 41 | 9 |
2 | 649 | 3 | 0 | 6/26/1982 | 1 | 6 | 23 | 24 | 27 | 39 | 34 |
3 | 649 | 4 | 0 | 7/3/1982 | 3 | 9 | 10 | 13 | 20 | 43 | 34 |
4 | 649 | 5 | 0 | 7/10/1982 | 5 | 14 | 21 | 31 | 34 | 47 | 45 |
lotto_649.tail() #displays the last 5 rows
PRODUCT | DRAW NUMBER | SEQUENCE NUMBER | DRAW DATE | NUMBER DRAWN 1 | NUMBER DRAWN 2 | NUMBER DRAWN 3 | NUMBER DRAWN 4 | NUMBER DRAWN 5 | NUMBER DRAWN 6 | BONUS NUMBER | |
---|---|---|---|---|---|---|---|---|---|---|---|
3660 | 649 | 3587 | 0 | 6/6/2018 | 10 | 15 | 23 | 38 | 40 | 41 | 35 |
3661 | 649 | 3588 | 0 | 6/9/2018 | 19 | 25 | 31 | 36 | 46 | 47 | 26 |
3662 | 649 | 3589 | 0 | 6/13/2018 | 6 | 22 | 24 | 31 | 32 | 34 | 16 |
3663 | 649 | 3590 | 0 | 6/16/2018 | 2 | 15 | 21 | 31 | 38 | 49 | 8 |
3664 | 649 | 3591 | 0 | 6/20/2018 | 14 | 24 | 31 | 35 | 37 | 48 | 17 |
We are going to create a function takes in a list of 6 numbers from 1 to 49 and should :
def extract_numbers(row):
# converts the values in the selected columns to a set.
row = row[['NUMBER DRAWN 1', 'NUMBER DRAWN 2', 'NUMBER DRAWN 3',
'NUMBER DRAWN 4', 'NUMBER DRAWN 5', 'NUMBER DRAWN 6']]
row = set(row.values)
return row
winning_numbers = lotto_649.apply(extract_numbers, axis=1)
winning_numbers.head()
0 {3, 41, 11, 12, 43, 14} 1 {33, 36, 37, 39, 8, 41} 2 {1, 6, 39, 23, 24, 27} 3 {3, 9, 10, 43, 13, 20} 4 {34, 5, 14, 47, 21, 31} dtype: object
def check_historical_occurence(user_no, winning_no):
# Compares the user input(python list) with the winning_numbers(pandas series)
# calculates the number of times the users input has occured before in the past using the winning_numbers as reference
user_no = set(user_no)
result = user_no == winning_no
historical_occurence = result.sum()
if historical_occurence == 0:
print("""
The combination {} has never occured before in the past.
However your chances to win the next draw with {} is 1 in 13,983,816 or 0.0000072%.
""".format(user_no, user_no)
)
else:
print(
"""The combination {} has occured {} times in the past.
However this doesn't guarantee that you will win the next draw.
You have a 1 in 13,983,816 or 0.0000072% chance of winning with {}.
""".format(user_no, historical_occurence, user_no)
)
#testing the check_historical_occurence function
test_ticket1 = [3, 41, 11, 12, 43, 14]
test_ticket2 = [11, 2, 4, 5, 6, 10]
test1 = check_historical_occurence(test_ticket1, winning_numbers)
print('\n')
test2 = check_historical_occurence(test_ticket2, winning_numbers)
The combination {3, 41, 11, 12, 43, 14} has occured 1 times in the past. However this doesn't guarantee that you will win the next draw. You have a 1 in 13,983,816 or 0.0000072% chance of winning with {3, 41, 11, 12, 43, 14}. The combination {2, 4, 5, 6, 10, 11} has never occured before in the past. However your chances to win the next draw with {2, 4, 5, 6, 10, 11} is 1 in 13,983,816 or 0.0000072%.
to successful compare a ticket with one that is historically winning, we had to write two functions. Our first function extract_numbers()
was one that extracts the set of numbers in winning tickets from the lotto_649
DataFrame . This is important because it allows us to compare the winning tickets with the ticket that a user is going to input. Our second function check_historical_occurence()
takes in the user's input, and a pandas Series containing all sets of winning numbers. It turns the user's input into a set and compares it to the sets in our series. If it matches any of the sets in the series, it prints the number of times there was a match and also the probability of winning the big prize with that set of numbers.
We are writting a function to calculate the probability of winning with any number of tickets between 1 to 13,983,816.
def multi_ticket_probability(n_tickets):
# gives the probability of winning with n number of tickets
possible_outcomes = combination(49, 6)
winning_outcomes = n_tickets
probability = winning_outcomes / possible_outcomes
probability_percentage = probability * 100 # turns the probability to a percentage
if n_tickets == 1:
print(
'''You have a 1 in {:,} or a {:.7f}% chance of winning
if you play with {:,} ticket.
'''.format(possible_outcomes, probability_percentage, n_tickets)
)
else:
new_possible_outcomes = int(possible_outcomes/winning_outcomes)
print(
'''You have a 1 in {:,} or a {:.7f}% chance of winning
if you play with {:,} tickets.
'''.format(new_possible_outcomes, probability_percentage, n_tickets)
)
# testing the multi_ticket_probability function
n_tickets = [1, 10, 100, 10000, 1000000, 6991908, 13983816]
for i in n_tickets:
multi_ticket_probability(i)
print('\n') # prints a new line after each iteration
You have a 1 in 13,983,816 or a 0.0000072% chance of winning if you play with 1 ticket. You have a 1 in 1,398,381 or a 0.0000715% chance of winning if you play with 10 tickets. You have a 1 in 139,838 or a 0.0007151% chance of winning if you play with 100 tickets. You have a 1 in 1,398 or a 0.0715112% chance of winning if you play with 10,000 tickets. You have a 1 in 13 or a 7.1511238% chance of winning if you play with 1,000,000 tickets. You have a 1 in 2 or a 50.0000000% chance of winning if you play with 6,991,908 tickets. You have a 1 in 1 or a 100.0000000% chance of winning if you play with 13,983,816 tickets.
People try to increase their chances of winning by playing multiple tickets. While it is true that playing multiple tickets increase your chances of winning, you need to play a ridiculously high amount of tickets to get any significant chance of winning. To get just a 7% chance of winning, you need to play 1,000,000 tickets.
The above function takes in any number for number of tickets played and then tells you the probability of winning if you play that number of tickets. To achieve this we did the following?
This function takes in an integer(n) between 2 and 5 and computes the probability of having n winning numbers on a ticket.
def probability_less_6(n):
## finds the probability of having 2 to 5 winning numbers in a ticket.
possible_outcomes = combination(49, 6)
successful_outcomes = {}
# calculates the total combinations of numbers between 2 and 5 if chosen from a set of 6 numbers
for i in range(2,6):
c = combination(6, i)
remainder = combination(43, 6 - i)
# the possible outcomes of picking any winning number between 2 and 5 from the 43 remaining numbers.
outcome = c * remainder
successful_outcomes[i] = outcome
#calculating the probability of having n winning numbers in a ticket
if n in successful_outcomes:
successful_outcome = successful_outcomes[n]
probability = successful_outcome / possible_outcomes
probability_percentage = probability * 100 #multiplying by 100 converts to a percentage
new_possible_outcome = int(possible_outcomes / successful_outcome)
print('''
You have a 1 in {:,} chance,
or a {:.4f}% chance of having {} winning numbers on a ticket.
'''.format(new_possible_outcome, probability_percentage, n)
)
# testing the probability_less_6 function
n_winning = [2, 3, 4, 5]
for i in n_winning:
probability_less_6(i)
print('\n') # prints a newline after each iteration
You have a 1 in 7 chance, or a 13.2378% chance of having 2 winning numbers on a ticket. You have a 1 in 56 chance, or a 1.7650% chance of having 3 winning numbers on a ticket. You have a 1 in 1,032 chance, or a 0.0969% chance of having 4 winning numbers on a ticket. You have a 1 in 54,200 chance, or a 0.0018% chance of having 5 winning numbers on a ticket.
Above we created a function that calculates the probability of having between 2 to 5 winning numbers on a ticket.
Of course everyone wants to win the big prize and people continue to play even when they know the chance of winning the big prize is really slim, one reason is because they feel having at least 2 winning numbers is some kind of compensation if they miss out on winning the big prize. But from what we can see above, there's only a meagre 13.24% chance of having 2 winning numbers on a ticket.
We are going to create a function that calculates the probability of having at least 2 to 5 winning tickets.
def probability_at_least_n(n):
## finds the probability of having at least 2 to 5 winning numbers in a ticket.
possible_outcomes = combination(49, 6)
probability_6_percentage = 1 / possible_outcomes * 100
probabilities = {}
probabilities[6] = probability_6_percentage #adds the probility of winning to the probabilities dictionary.
# calculates the total combinations of numbers between 2 and 5 if chosen from a set of 6 numbers
for i in range(2,6):
c = combination(6, i)
remainder = combination(43, 6 - i)
# the possible outcomes of picking any winning number between 2 and 5 from the 43 remaining numbers.
outcome = c * remainder
probability = outcome / possible_outcomes
probability_percentage = probability * 100 #multiplying by 100 converts to a percentage
probabilities[i] = probability_percentage # adds the probabilities of the numbers between 2 and 5 to the probabilities dictionary
#calculating the probability of having at least n winning numbers in a ticket
if n in probabilities:
total_probability = 0
for i in range(n, 7):
total_probability += probabilities[i]
print('''
You have a {:.6f}% chance of having at least {} winning numbers
on a ticket.
'''.format(total_probability, n)
)
# testing the probability_at_least_n function
for i in n_winning:
probability_at_least_n(i)
print('\n') # prints a newline after each iteration
You have a 15.101557% chance of having at least 2 winning numbers on a ticket. You have a 1.863755% chance of having at least 3 winning numbers on a ticket. You have a 0.098714% chance of having at least 4 winning numbers on a ticket. You have a 0.001852% chance of having at least 5 winning numbers on a ticket.
We created a function that calculates the probability of having at least n winning number between 2 and 6. To achieve this we:
We asked a very important question at the beginning of this project.
We've so far been able to find answers for these questions.