Many people start playing the lottery for fun, but for some this activity turns into a habit which eventually escalates into addiction. Like other compulsive gamblers, lottery addicts soon begin spending from their savings and loans, they start to accumulate debts, and eventually engage in desperate behaviors like theft.
A medical institute that aims to prevent and treat gambling addictions wants to build a dedicated mobile app to help lottery addicts better estimate their chances of winning. The institute has a team of engineers that will build the app, but they need us to create the logical core of the app and calculate probabilities.
For the first version of the app, they want us to focus on the 6/49 lottery and build functions that enable users to answer questions like:
The institute also wants us to consider historical data coming from the national 6/49 lottery game in Canada. The data set has data for 3,665 drawings, dating from 1982 to 2018 (we'll come back to this).
The scenario we're following throughout this project is fictional — the main purpose is to practice applying the concepts we learned in a setting that simulates a real-world scenario.
We will start by creating some functions that will be important for the rest of the project:
# Import needed packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Define a function to calculate the factorial of a number n
def factorial(n):
result = 1
for x in range(n, 0, -1):
result *= x
return result
# Define a function to calculate the number of combinations in groups of k from a population n, without replacement
def combinations(n, k):
result = int(factorial(n) / (factorial(n-k) * factorial(k)))
return result
For the first version of the app, we want players to be able to calculate the probability of winning the big prize with the various numbers they play on a single ticket (for each ticket a player chooses six numbers out of 49). So, we'll start by building a function that calculates the probability of winning the big prize for any given ticket.
# Define a function that takes a Python list with 6 numbers as input, and returns the probability of winning the big prize
def one_ticket_probability(list):
probability = 1 / combinations(49, 6) * 100
return "The probability of winning the big prize with these numbers is " + "{:.6f}".format(probability) + "% - one in " + str(combinations(49, 6))
# Test the function
test_list = [1,2,3,4,5,6]
one_ticket_probability(test_list)
'The probability of winning the big prize with these numbers is 0.000007% - one in 13983816'
Just above, we wrote a function that can tell users what is the probability of winning the big prize with a single ticket. For the first version of the app, however, users should also be able to compare their ticket against the historical lottery data in Canada and determine whether they would have ever won by now.
We will proceed with reading the historical data and a first inspection:
six49 = pd.read_csv("649.csv", parse_dates=["DRAW DATE"])
display("Shape of the dataset:", six49.shape)
display("Types of data:", six49.dtypes)
display("Null values per column:", six49.isnull().sum())
display(six49.head(3))
display(six49.tail(3))
'Shape of the dataset:'
(3665, 11)
'Types of data:'
PRODUCT int64 DRAW NUMBER int64 SEQUENCE NUMBER int64 DRAW DATE datetime64[ns] NUMBER DRAWN 1 int64 NUMBER DRAWN 2 int64 NUMBER DRAWN 3 int64 NUMBER DRAWN 4 int64 NUMBER DRAWN 5 int64 NUMBER DRAWN 6 int64 BONUS NUMBER int64 dtype: object
'Null values per column:'
PRODUCT 0 DRAW NUMBER 0 SEQUENCE NUMBER 0 DRAW DATE 0 NUMBER DRAWN 1 0 NUMBER DRAWN 2 0 NUMBER DRAWN 3 0 NUMBER DRAWN 4 0 NUMBER DRAWN 5 0 NUMBER DRAWN 6 0 BONUS NUMBER 0 dtype: int64
PRODUCT | DRAW NUMBER | SEQUENCE NUMBER | DRAW DATE | NUMBER DRAWN 1 | NUMBER DRAWN 2 | NUMBER DRAWN 3 | NUMBER DRAWN 4 | NUMBER DRAWN 5 | NUMBER DRAWN 6 | BONUS NUMBER | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 649 | 1 | 0 | 1982-06-12 | 3 | 11 | 12 | 14 | 41 | 43 | 13 |
1 | 649 | 2 | 0 | 1982-06-19 | 8 | 33 | 36 | 37 | 39 | 41 | 9 |
2 | 649 | 3 | 0 | 1982-06-26 | 1 | 6 | 23 | 24 | 27 | 39 | 34 |
PRODUCT | DRAW NUMBER | SEQUENCE NUMBER | DRAW DATE | NUMBER DRAWN 1 | NUMBER DRAWN 2 | NUMBER DRAWN 3 | NUMBER DRAWN 4 | NUMBER DRAWN 5 | NUMBER DRAWN 6 | BONUS NUMBER | |
---|---|---|---|---|---|---|---|---|---|---|---|
3662 | 649 | 3589 | 0 | 2018-06-13 | 6 | 22 | 24 | 31 | 32 | 34 | 16 |
3663 | 649 | 3590 | 0 | 2018-06-16 | 2 | 15 | 21 | 31 | 38 | 49 | 8 |
3664 | 649 | 3591 | 0 | 2018-06-20 | 14 | 24 | 31 | 35 | 37 | 48 | 17 |
Then, we will define two functions:
# Define a function that takes as input a row of the lottery dataframe and returns a set containing all the six winning numbers
winning_numbers = []
def extract_numbers(row):
numbers = set()
for x in range(4,10):
numbers.add(row[x])
winning_numbers.append(numbers)
return winning_numbers
# Apply the function
six49.apply(extract_numbers, axis=1)
# Display results of the function
display(six49.head())
display("First elements of the winning_numbers list:" ,winning_numbers[:5])
PRODUCT | DRAW NUMBER | SEQUENCE NUMBER | DRAW DATE | NUMBER DRAWN 1 | NUMBER DRAWN 2 | NUMBER DRAWN 3 | NUMBER DRAWN 4 | NUMBER DRAWN 5 | NUMBER DRAWN 6 | BONUS NUMBER | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 649 | 1 | 0 | 1982-06-12 | 3 | 11 | 12 | 14 | 41 | 43 | 13 |
1 | 649 | 2 | 0 | 1982-06-19 | 8 | 33 | 36 | 37 | 39 | 41 | 9 |
2 | 649 | 3 | 0 | 1982-06-26 | 1 | 6 | 23 | 24 | 27 | 39 | 34 |
3 | 649 | 4 | 0 | 1982-07-03 | 3 | 9 | 10 | 13 | 20 | 43 | 34 |
4 | 649 | 5 | 0 | 1982-07-10 | 5 | 14 | 21 | 31 | 34 | 47 | 45 |
'First elements of the winning_numbers list:'
[{3, 11, 12, 14, 41, 43}, {8, 33, 36, 37, 39, 41}, {1, 6, 23, 24, 27, 39}, {3, 9, 10, 13, 20, 43}, {5, 14, 21, 31, 34, 47}]
# Define a function that returns the number of times the combination inputted by the user occurred in the past
def check_historical_occurence(user_list, winning_list):
win_times = winning_list.count(set(user_list))
return "The combination " + str(user_list) + " has already won the lottery " + str(win_times) + """ times.
""" + one_ticket_probability(user_list)
# Testing the function with a combination that occured and another that did not
test_occured = set([3, 11, 12, 14, 41, 43])
test_not_occurred = set([1, 2, 3, 4, 5, 6])
print(check_historical_occurence(test_occured, winning_numbers))
print(check_historical_occurence(test_not_occurred, winning_numbers))
The combination {3, 41, 11, 12, 43, 14} has already won the lottery 1 times. The probability of winning the big prize with these numbers is 0.000007% - one in 13983816 The combination {1, 2, 3, 4, 5, 6} has already won the lottery 0 times. The probability of winning the big prize with these numbers is 0.000007% - one in 13983816
Lottery addicts usually play more than one ticket on a single drawing, thinking that this might increase their chances of winning significantly. Our purpose is to help them better estimate their chances of winning, so we are going to write a function that will allow the users to calculate the chances of winning for any number of different tickets.
# Define a function that takes a number of tickets as input, and returns the probability of winning the big prize
def multi_ticket_probability(number):
probability = number / combinations(49, 6) * 100
output_odds = round(combinations(49, 6) / number)
return "The probability of winning the big prize with " + str(number) + " tickets is " + "{:.6f}".format(probability) + "% - 1 in " + str(output_odds)
# Test the function
display(multi_ticket_probability(1))
display(multi_ticket_probability(10))
display(multi_ticket_probability(100))
display(multi_ticket_probability(10000))
display(multi_ticket_probability(1000000))
display(multi_ticket_probability(6991908))
display(multi_ticket_probability(13983816))
'The probability of winning the big prize with 1 tickets is 0.000007% - 1 in 13983816'
'The probability of winning the big prize with 10 tickets is 0.000072% - 1 in 1398382'
'The probability of winning the big prize with 100 tickets is 0.000715% - 1 in 139838'
'The probability of winning the big prize with 10000 tickets is 0.071511% - 1 in 1398'
'The probability of winning the big prize with 1000000 tickets is 7.151124% - 1 in 14'
'The probability of winning the big prize with 6991908 tickets is 50.000000% - 1 in 2'
'The probability of winning the big prize with 13983816 tickets is 100.000000% - 1 in 1'
For extra context, in most 6/49 lotteries there are smaller prizes if a player's ticket match two, three, four, or five of the six numbers drawn. As a consequence, the users might be interested in knowing the probability of having two, three, four, or five winning numbers.
# Define a function that takes as input a list of numbers and the number of those to check for probability of being winning numbers
def probability_less_6(user_list, number):
# Combinations with the numbers that the user chooses
user_combinations = combinations(6, number)
# Combinations with the remaining numbers (excluding remaining user numbers)
other_numbers_combinations = combinations(49 - 6, 6 - number)
# Total combinations that make the user get N winning numbers
winning_combinations = (user_combinations * other_numbers_combinations)
# Total possible combinations
total_combinations = combinations(49, 6)
# Percentage of getting N winning numbers
probability = winning_combinations / total_combinations * 100
# Output in odds
output_odds = round(total_combinations / winning_combinations)
return "The combination " + str(user_list) + " has a " + "{:.3f}".format(probability) + "% probability of having " + str(number) + " winning numbers - 1 in " + str(output_odds)
# Test the function
test_list = [1, 2, 3, 4, 5, 6]
print(probability_less_6(test_list, 2))
print(probability_less_6(test_list, 3))
print(probability_less_6(test_list, 4))
print(probability_less_6(test_list, 5))
The combination [1, 2, 3, 4, 5, 6] has a 13.238% probability of having 2 winning numbers - 1 in 8 The combination [1, 2, 3, 4, 5, 6] has a 1.765% probability of having 3 winning numbers - 1 in 57 The combination [1, 2, 3, 4, 5, 6] has a 0.097% probability of having 4 winning numbers - 1 in 1032 The combination [1, 2, 3, 4, 5, 6] has a 0.002% probability of having 5 winning numbers - 1 in 54201
We have created this project as a way to practice probabilities in Python. Several functions have been defined with regard to probabilities of winning lotteries:
The results that we got during the project shows, clearly, that lotteries are not worth our money - we can find better things to do with it.