6/49 Lottery: chances to win¶

Lotto 6/49 is a national lottery game in Canada. In this weekly lottery, as the name implies, 6 numbers are drawn from a set of 49. If you have a ticket with exactly these 6 numbers, you win a big prize. If you have not all 6, but some numbers (5, 4, 3, 2) correct, then you also still win a (smaller) prize.

As in any gambling game, addiction is a risk, while chances that you do win indeed, are not exactly high.

Now, I am pretending to work for for an institute that wants to help (potential) gambling addicts. One thing the institute wants to do is create an app that shows the following chances:

the chance to win the big prize with a single ticket
the chance to win the big prize with multiple tickets
the chance to win smaller prizes with a ticket

In addition, for any ticket (combination of 6 numbers) the app should be able to show how many times it would have won the big prize during 3,665 drawings in the past (dating from 1982 tot 2018). For this, the institute has historic data available.

I have been tasked to created the 'heart' of the app: create functions for each of these scenarios that take the required inputs and returns the desired outputs.

Define helper functions¶

I'll start with defining 2 helper functions that we'll need for the calculations.

In [1]:

# Define a function that takes input n and returns factorial(n)
def factorial(n):
    outcome = 1
    for counter in range(1,n+1):
        outcome = outcome * counter
        counter = counter + 1
    return outcome

In [2]:

# Define a function that takes inputs n, k and returns the number of combinations when taking k objects from a group of n objects
def combinations(n,k):
    numerator = factorial(n)
    denominator = factorial(k)*factorial(n-k)
    return (numerator/denominator)

Chance to win the big prize¶

Function input: a list of 6 numbers representing one lottery ticket.
Function output: a printed text including the chance to win the big prize with this ticket by having all 6 numbers correct.

In [3]:

# Define a function that calculates the chance of winning the big prize
def one_ticket_probability(list_of_6):
    nr_of_combs = combinations(49,6)
    chance_to_win = 1/nr_of_combs
    output = "The chance to win the big prize with combination {} is {:.10%}.".format(list_of_6, chance_to_win)
    output = output + "\nOr, said differently, 1 out of {:,} times.".format(int(nr_of_combs))
    print (output)     

In [4]:

# Test it with some list of 6 numbers
series = [1, 2, 3, 7, 12, 16]
chance_big_price = one_ticket_probability(series)

The chance to win the big prize with combination [1, 2, 3, 7, 12, 16] is 0.0000071511%.
Or, said differently, 1 out of 13,983,816 times.

In [5]:

# Test it with another list of 6 numbers
series = [41, 42, 3, 47, 18, 16]
one_ticket_probability(series)

The chance to win the big prize with combination [41, 42, 3, 47, 18, 16] is 0.0000071511%.
Or, said differently, 1 out of 13,983,816 times.

The app user will be able to see that for his set of 6 numbers the chance of winning the big prize is extremely slim.

Did this win the big prize in the past?¶

As mentioned in the introduction, the institute can use a dataset with historic data that is available on Kaggle so that an app user can see for any ticket how many times such ticket (that is: this combination of 6 numbers) would have won him the big prize.

I have downloaded the dataset and saved it in the same folder as this notebook. I will start with exploring the dataset a bit, then continue with writing the function.

Function input: a list of 6 numbers representing one lottery ticket.
Function output: a printed text show how many times this combination won in the past (1982-2018), and also including the chance to win the big prize with this ticket.

In [6]:

# Import pandas
import pandas as pd

In [7]:

# Import the dataset, store as dataframe
history = pd.read_csv('649.csv')

In [8]:

# Print number of rows and columns
history.shape

Out[8]:

(3665, 11)

In [9]:

# Show the first 3 rows
history.head(3)

Out[9]:

	PRODUCT	DRAW NUMBER	DRAW DATE	NUMBER DRAWN 1	NUMBER DRAWN 2	NUMBER DRAWN 3	NUMBER DRAWN 4	NUMBER DRAWN 5	NUMBER DRAWN 6	BONUS NUMBER
0	649	1	6/12/1982	3	11	12	14	41	43	13
1	649	2	6/19/1982	8	33	36	37	39	41	9
2	649	3	6/26/1982	1	6	23	24	27	39	34

In [10]:

# Show the last 3 rows
history.tail(3)

Out[10]:

	PRODUCT	DRAW NUMBER	DRAW DATE	NUMBER DRAWN 1	NUMBER DRAWN 2	NUMBER DRAWN 3	NUMBER DRAWN 4	NUMBER DRAWN 5	NUMBER DRAWN 6	BONUS NUMBER
3662	649	3589	6/13/2018	6	22	24	31	32	34	16
3663	649	3590	6/16/2018	2	15	21	31	38	49	8
3664	649	3591	6/20/2018	14	24	31	35	37	48	17

In [11]:

# Show relevant information about all columns
history.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3665 entries, 0 to 3664
Data columns (total 11 columns):
PRODUCT            3665 non-null int64
DRAW NUMBER        3665 non-null int64
SEQUENCE NUMBER    3665 non-null int64
DRAW DATE          3665 non-null object
NUMBER DRAWN 1     3665 non-null int64
NUMBER DRAWN 2     3665 non-null int64
NUMBER DRAWN 3     3665 non-null int64
NUMBER DRAWN 4     3665 non-null int64
NUMBER DRAWN 5     3665 non-null int64
NUMBER DRAWN 6     3665 non-null int64
BONUS NUMBER       3665 non-null int64
dtypes: int64(10), object(1)
memory usage: 315.1+ KB

What we can see is that we have a set of clean-looking data of 3,665 historic drawings between June 1982 and June 2018. Let's proceed, first with a helper function that collects all winning combinations from the past, then with the actual function.

For this part we will make use of Python 'sets', as that is most efficient (best performing) for the comparisons that need to be done.

In [12]:

# Specify the relevant columns in a list
columns_numbers = history.columns[history.columns.str.startswith('NUMBER DRAWN')].tolist()

# Initiate a list that will contain the sets of winning numbers
historic_sets=[]

# Define a function that take one dataframe row as input, then appends its 6 numbers as a set to the historic-sets list
def append_numbers(a_row):
    numbers_as_series = a_row[columns_numbers].astype(int)
    numbers_as_set = set(numbers_as_series)
    historic_sets.append(numbers_as_set)

# Apply to the history dataframe
history.apply(append_numbers, axis = 1)
    

Out[12]:

0       None
1       None
2       None
3       None
4       None
        ... 
3660    None
3661    None
3662    None
3663    None
3664    None
Length: 3665, dtype: object

In [13]:

# Print the first 3 historic sets to validate
historic_sets[0:3]

Out[13]:

[{3, 11, 12, 14, 41, 43}, {8, 33, 36, 37, 39, 41}, {1, 6, 23, 24, 27, 39}]

That's the same as the first 3 rows from the dataframe. Let's proceed with the requested function.

In [14]:

# Define a function taking (1) a list with 6 numbers (2) all historic sets of 6 numbers,
# then prints how many times the list occurred in history,
# and also prints the chance of winning with these 6 numbers
def check_historical_occurrence(list_of_6, historic_sets_of_6):
    set_of_6 = set(list_of_6)
    occurrences_total = len(historic_sets_of_6)
    occurrences = 0
    for a_set in historic_sets_of_6[0:3]:
        if a_set == set_of_6:
            print
            occurrences = occurrences + 1

    output = "During {} drawings (1982-2018), combination {} won the big prize {} time(s).".format(occurrences_total, list_of_6, occurrences)
    print(output)
    # Also print the chance of winning with this combination
    #print('\n')
    one_ticket_probability(list_of_6)

In [15]:

# Test with the first historic set
test_1 = [3, 11, 12, 14, 41, 43]
check_historical_occurrence(test_1, historic_sets)

During 3665 drawings (1982-2018), combination [3, 11, 12, 14, 41, 43] won the big prize 1 time(s).
The chance to win the big prize with combination [3, 11, 12, 14, 41, 43] is 0.0000071511%.
Or, said differently, 1 out of 13,983,816 times.

In [16]:

# Test with the first historic set, putting the numbers in a different sequence (should not make a difference)
test_2 = [11, 12, 3, 43, 41, 14]
check_historical_occurrence(test_2, historic_sets)

During 3665 drawings (1982-2018), combination [11, 12, 3, 43, 41, 14] won the big prize 1 time(s).
The chance to win the big prize with combination [11, 12, 3, 43, 41, 14] is 0.0000071511%.
Or, said differently, 1 out of 13,983,816 times.

In [17]:

# Test with another randomly choosen set
test_3 = [42, 31, 8, 43, 17, 4]
check_historical_occurrence(test_3, historic_sets)

During 3665 drawings (1982-2018), combination [42, 31, 8, 43, 17, 4] won the big prize 0 time(s).
The chance to win the big prize with combination [42, 31, 8, 43, 17, 4] is 0.0000071511%.
Or, said differently, 1 out of 13,983,816 times.

The app user will most likely see that his combination of 6 numbers never won the big prize over the course of 36 years. (Unless he deliberately choose 6 numbers that won in the past.)
And either way, he will see that for his set of 6 numbers the chance of winning the big prize is extremely slim.

Chance to win the big prize with multiple tickets¶

Function input: an amount of lottery tickets.
Function output: a printed text including the chance to win the big prize with one of these tickets.

Assumption is that all tickets are different from each other.

In [18]:

# Define a function that takes an amount of (different) tickets as input, and calculates the chance of winning the big prize
def multi_ticket_probabibility(amount_tickets):
    nr_of_combs = combinations(49,6)
    chance_to_win = amount_tickets/nr_of_combs
    output = "The chance to win the big prize with {} ticket(s) is {:.10%}.".format(amount_tickets, chance_to_win)
    output = output + "\nOr, said differently, 1 out of {:,} times.".format(int(nr_of_combs/amount_tickets))
    print (output)     

In [19]:

# Show the result for several amounts of tickets
ticket_amounts = [1, 10, 100, 10000, 1000000, 6991908, 13983816]
for amount in ticket_amounts:
    multi_ticket_probabibility(amount)
    print('\n')

The chance to win the big prize with 1 ticket(s) is 0.0000071511%.
Or, said differently, 1 out of 13,983,816 times.


The chance to win the big prize with 10 ticket(s) is 0.0000715112%.
Or, said differently, 1 out of 1,398,381 times.


The chance to win the big prize with 100 ticket(s) is 0.0007151124%.
Or, said differently, 1 out of 139,838 times.


The chance to win the big prize with 10000 ticket(s) is 0.0715112384%.
Or, said differently, 1 out of 1,398 times.


The chance to win the big prize with 1000000 ticket(s) is 7.1511238420%.
Or, said differently, 1 out of 13 times.


The chance to win the big prize with 6991908 ticket(s) is 50.0000000000%.
Or, said differently, 1 out of 2 times.


The chance to win the big prize with 13983816 ticket(s) is 100.0000000000%.
Or, said differently, 1 out of 1 times.

The app user will see that even if he will buy many tickets, chances of winning the big prize are still very slim. Even if you buy 10,000 tickets (which costs serious money), chance to win the big prize is still below 0.1%. And one needs to buy 7 million tickets to reach a 50% chance of winning the big prize!

Chance to win smaller prizes with a ticket¶

Now give output for chances of winning a smaller prize by having at least some numbers correct.

Function input: an amount of numbers (1-6).
Function output: chance of having that amount of numbers correct on any lottery ticket.

It will calculate the amount of having exactly that amount correct (as opposed to 'at least' that amount).

In [20]:

# Define a function that takes an amount (2, 3, 4, 5) as an input, and calculates the chance of winning a prize by having that amount of numbers correct
def probabibility_less_6(amount):
    # Calculate how many 'amount-number' (2-number, 3-number,...) combinations can be formed from any set of 6 numbers (that is: any ticket)
    amount_number_combs = combinations(6, amount)
    # Calculate how many successful 6-number combinations (=lottery outcomes) exist for each of those
    # The remaining 'big prize' numbers must come from a set of 43 numbers
    # For 5: 1 remaining
    # For 4: 2 remaining etc
    remaining = combinations(43, 6-amount)
    successful_outcomes = amount_number_combs * remaining
    # Calculate the total number of possible outcomes
    nr_of_combs = combinations(49,6)
    chance_to_win = successful_outcomes/nr_of_combs
    # For validation/debugging (now commented out)
    # print (amount_number_combs, remaining, successful_outcomes,chance_to_win)
    # Print output
    output = "The chance to win a prize for having {} numbers correct is {:.10%}.".format(amount, chance_to_win)
    output = output + "\nOr, said differently, 1 out of {:.1f} times.".format(nr_of_combs/successful_outcomes)
    print (output) 

In [21]:

# Show the result for several amounts of numbers
amount_set = [2,3,4,5,6]
for amount in amount_set:
    probabibility_less_6(amount)
    print('\n')

The chance to win a prize for having 2 numbers correct is 13.2378029002%.
Or, said differently, 1 out of 7.6 times.


The chance to win a prize for having 3 numbers correct is 1.7650403867%.
Or, said differently, 1 out of 56.7 times.


The chance to win a prize for having 4 numbers correct is 0.0968619724%.
Or, said differently, 1 out of 1032.4 times.


The chance to win a prize for having 5 numbers correct is 0.0018449900%.
Or, said differently, 1 out of 54200.8 times.


The chance to win a prize for having 6 numbers correct is 0.0000071511%.
Or, said differently, 1 out of 13983816.0 times.

The app user will see that his chance to win even the smallest prize, which you win with 2 correct numbers (that prize being something like 'another play for free' or a tiny amount of money) is still less than 1 out of 7, and if he checks for any amount higher than 2 correct numbers, he will see that he has a very small chance to win.

Wrapping up¶

We have created 3 functions that for a given input returns a text showing the chance of winning (something) in the 6/49 Lottery. And by testing these functions, we have seen that these chances are very, very slim.

To further extend the app, as next steps we could bring these small numbers more to life. E.g:

compare chances of winning with other events, e.g. a meteor hitting the earth, being hit by a truck while crossing the road etc;
express how many years it would - on average - take before one would win.