The 1 in 13,983,816¶

Index¶

1 Introduction
2 Reading the Data
3 Functions to Function
4 Oooo...The One Ticket!!
5 Ha! I'll buy a couple of 'em..
6 I'll win those other prizes..
7 Conclusion
8 Learnings
9 Acknowledgement

1¶

Introduction¶

Gambling is risky. In a single moment you could become a millionaire or a pauper. Many are aware of its risks and tend to shy away from it. The lottery, while being another form of gambling is comparatively less riskier and more associated to fun. Moreover since many state and federal lotteries have government backing, more people are inclined to trust it more.

However it can become an addiction especially when the stakes go up each week. Many are lured by the possibility of vast riches in a short amout of time that it leads them to spend huge amounts of money to buy lottery tickets. Many of these addicts fail to truly understand the probability of winning causing them to spend more than they can afford.

The engineers of a medical institute have tasked us with creating the logical core of a gambling prevention app that would help users better understand their odds of winning.

The goal of this project is to write code that would facilitate engineers to answer some of the following questions:

What is the probability of winning the big prize with a single ticket?
What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
What is the probability of having at exactly five (or four, or three, or two) winning numbers on a single ticket?

Index

2¶

Reading the Data¶

The data for testing, provided by the institute is the historical data coming from the national 6/49 lottery game in Canada. The data set has data for 3,665 drawings, dating from 1982 to 2018.

In [1]:

#Read the data
import pandas as pd
import random
lottery = pd.read_csv('649.csv')
lottery.head(5)

Out[1]:

	PRODUCT	DRAW NUMBER	DRAW DATE	NUMBER DRAWN 1	NUMBER DRAWN 2	NUMBER DRAWN 3	NUMBER DRAWN 4	NUMBER DRAWN 5	NUMBER DRAWN 6	BONUS NUMBER
0	649	1	6/12/1982	3	11	12	14	41	43	13
1	649	2	6/19/1982	8	33	36	37	39	41	9
2	649	3	6/26/1982	1	6	23	24	27	39	34
3	649	4	7/3/1982	3	9	10	13	20	43	34
4	649	5	7/10/1982	5	14	21	31	34	47	45

In [2]:

lottery.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3665 entries, 0 to 3664
Data columns (total 11 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   PRODUCT          3665 non-null   int64 
 1   DRAW NUMBER      3665 non-null   int64 
 2   SEQUENCE NUMBER  3665 non-null   int64 
 3   DRAW DATE        3665 non-null   object
 4   NUMBER DRAWN 1   3665 non-null   int64 
 5   NUMBER DRAWN 2   3665 non-null   int64 
 6   NUMBER DRAWN 3   3665 non-null   int64 
 7   NUMBER DRAWN 4   3665 non-null   int64 
 8   NUMBER DRAWN 5   3665 non-null   int64 
 9   NUMBER DRAWN 6   3665 non-null   int64 
 10  BONUS NUMBER     3665 non-null   int64 
dtypes: int64(10), object(1)
memory usage: 315.1+ KB

The dataset does not seem to have any missing data. We can now proceed with writing the core logic based on the different questions that the medical requires us to answer.

Index

3¶

Functions to Function¶

Writing the basic functions required to write the core logic

Based on the questions we have been provided, two basic funtions that we will require while coding are:

To find the factorial of a number n
To find the combinations given a number n and the number of selections k
To convert a list to a set

In [3]:

def bold_print(a_string,a_value=None):
    """
    Boldens the output
    
    Args:
        a_string (string): String to be bolded
    """
    print("\033[1m"+a_string+"\033[0m"+'\n')

In [4]:

def factorial(n):
    """
    Calculates the factorial of a given number
    
    Args:
    n (int): Number for which factorial must be calculated
    
    Returns:
    result (int): Factorial of the given number
    """
    result = 1
    for i in range(n):
        result*=i+1
    return result

#Test the function
value=3
bold_print("Testing the factorial function:")
print("{0}! is {1}".format(value,factorial(value)))

Testing the factorial function:

3! is 6

In [5]:

def combinations(n,k):
    """
    Calculate the number of combinations of a given set
    
    Args:
    n (int): Number of objects in the set
    k (int): Number of chosen objects from the set
    
    Returns:
    (int): Number of combinations
    """
    if n > k:
        return factorial(n)/(factorial(k)*factorial(n-k))
    else:
        print("Number of values in the set must not be less than the number of combinations")

#Test the function
cardinality=12
subset_cardinaltiy=7
bold_print("Testing the combinations function:")
calculate=combinations(cardinality,subset_cardinaltiy)
print("{0}C{1} is {2}".format(cardinality,subset_cardinaltiy,calculate))

Testing the combinations function:

12C7 is 792.0

In [6]:

def extract_numbers(list1):
    """
    Converts a list to a set
    
    Args:
    list1 (list): List to be converted to set
    
    Returns:
    (set): list of elements as a set
    """
    return set(list1)

Index

4¶

Oooo...The One Ticket!!¶

Calculating the probability of buying the winning ticket

One of the questions asked was what the probability was of buying the winning ticket. The way the lottery works is that six numbers are chosen from a range of 1 to 49. The winning ticket will have six of the drawn numbers (the order in which the numbers are drawn have no impact on winning). Therefore every ticket purchased has the same probability.

There is also the chance that the numbers drawn may have been previously drawn in some previous. Using the dataset we should be able to find out whether a given set of numbers were previously drawn and what the probability of winning a ticket is.

In [7]:

#Convert columns 'NUMBER DRAWN 1':'NUMBER DRAWN 6' to set
numbers_drawn = lottery.loc[:,'NUMBER DRAWN 1':'NUMBER DRAWN 6'].apply(extract_numbers,axis=1)
numbers_drawn = pd.Series(numbers_drawn)
bold_print("The first ten winning lottery tickets:")
numbers_drawn.head(10)

The first ten winning lottery tickets:

Out[7]:

0     {3, 41, 11, 12, 43, 14}
1     {33, 36, 37, 39, 8, 41}
2      {1, 6, 39, 23, 24, 27}
3      {3, 9, 10, 43, 13, 20}
4     {34, 5, 14, 47, 21, 31}
5     {8, 41, 20, 21, 25, 31}
6    {33, 36, 42, 18, 25, 28}
7     {7, 40, 16, 17, 48, 31}
8     {37, 5, 38, 10, 23, 27}
9     {4, 37, 46, 15, 48, 30}
dtype: object

Since we have converted the columns to sets, we can verify whether there are sets that have had multiple historical occurences.

In [8]:

#Find whether the same set was drawn more than once
draw={}
for each in numbers_drawn:
    a_draw=frozenset(list(each))
    if a_draw in draw:
        draw[a_draw]+=1
    else:
        draw[a_draw]=1
        
bold_print("Maximum Number of times the same set of tickets have been historically drawn:")
max(draw.values())        

Maximum Number of times the same set of tickets have been historically drawn:

Out[8]:

In [9]:

def one_ticket_probability(number_list):
    """
    Calculates the probability of having a winning ticket
    
    Args:
    number_list (list): List of numbers of the lottery ticket
    
    Returns:
    probability (float): Probability of winning based on the given numbers
    """
    total_combinations = combinations(49,6)
    return total_combinations

#Test the function
number_list = [12,5,3,6,47,1]
one_ticket_probability(number_list)
bold_print("Testing the one_ticket_probability function:")
print("The probability of winning with ticket numbers {0} is 1 in {1:,}".format(number_list,int(one_ticket_probability(number_list))))

# #Test the function
# number_list = [12,5,3,6,47,1]
# one_ticket_probability(number_list)

Testing the one_ticket_probability function:

The probability of winning with ticket numbers [12, 5, 3, 6, 47, 1] is 1 in 13,983,816

In [10]:

def one_ticket_probability(number_list):
    """
    Calculates the probability of having a winning ticket
    
    Args:
    number_list (list): List of numbers of the lottery ticket
    
    Returns:
    probability (float): Probability of winning based on the given numbers
    """
    probability = (1/combinations(49,6))
    return probability

#Test the function
number_list = [12,5,3,6,47,1]
one_ticket_probability(number_list)

Out[10]:

7.151123842018516e-08

In [11]:

def check_historical_occurence(user_selection,winning_list):
    """
    Flags the number of times a lottery combination has historically appeared from the given set
    
    Args:
    number_list (set): List of numbers of the lottery ticket
    winning_list (series): Historical combination of numbers that have won the lottery
    
    Returns:
    boolean_check (series): Flags that indicate which lottery combination previously won
    """
    win_count=0
    boolean_check = [True if user_selection == each_ticket else False for each_ticket in winning_list]
    return pd.Series(boolean_check)

#Test the function
user_choice = {33, 36, 37, 39, 8, 41}
bold_print("Testing the check_historical_occurence function:")
win_count = check_historical_occurence(user_choice,numbers_drawn).sum()
print("The numbers {0} have been drawn {1} time/s in the past.".format(user_choice,win_count))

Testing the check_historical_occurence function:

The numbers {33, 36, 37, 39, 8, 41} have been drawn 1 time/s in the past.

Now that we have functions to check the probability of a given set of numbers and the number of historical occurences the set has had. We can proceed to test both functions simulataneously.

In [12]:

def win_chances(user_choice):
    """
    Calculates the probability of a set of numbers winning
    
    Args:
    user_choice (set): List of numbers of a lottery ticket
    
    """
    death_options=['cold','lightining','fireworks','space accident']
    win_count = check_historical_occurence(user_choice,numbers_drawn).sum()
    frequency = {1:'once',2:'twice',3:'thrice'}
    if win_count > 0:
        print('The probability of winning the big price with the set {userset} is 1 in {prob:,}\nThis combination of numbers have been drawn {won_count} before.'.format(userset=user_choice,prob=int(one_ticket_probability(user_choice)),won_count=frequency[win_count]))
    else:
        print('The probability of winning the big price with the set {userset} is 1 in {prob:,}\n'.format(userset=user_choice,prob=int(one_ticket_probability(user_choice))))
    print('With those odds you are more likely to be killed by {}.'.format(death_options[random.randint(0,3)]))

In [13]:

#Test the win_chances function
bold_print("Testing the win_chances function:")
user_choice = {33, 6, 7, 39, 8, 41}
win_chances(user_choice)

Testing the win_chances function:

The probability of winning the big price with the set {33, 6, 39, 8, 7, 41} is 1 in 0

With those odds you are more likely to be killed by lightining.

In [14]:

#Test the win_chances function for another event
bold_print("Testing the win_chances function:")
user_choice = {11, 6, 7, 50, 8, 41}
win_chances(user_choice)

Testing the win_chances function:

The probability of winning the big price with the set {6, 7, 8, 41, 11, 50} is 1 in 0

With those odds you are more likely to be killed by cold.

In [15]:

#Test the win_chances function for when a historically winning lotto has been entered
bold_print("Testing the win_chances function:")
user_choice = {3, 41, 11, 12, 43, 14}
win_chances(user_choice)

Testing the win_chances function:

The probability of winning the big price with the set {3, 41, 11, 12, 43, 14} is 1 in 0
This combination of numbers have been drawn once before.
With those odds you are more likely to be killed by space accident.

In addition to providing the details that the user will require about the probabilities of winning, an additional line of what other probability is more likely to happen has been given, like the probability of dying by fireworks or a space accident.

These probabilities have been calculated and taken from here and while it is not applicable to everyone, it should playfully encourage addicts to not engage their addiction (of buying a lottery ticket).

Index

5¶

Ha! I'll buy a couple of 'em..¶

Calculating the probability of winning after buying multiple tickets

Certain addicts may not be convinced by the probabilities of a single ticket. When presented with such a probability they are likely to argue that they buy multiple tickets which would increase their probability of winning the lottery.

We shall write a function to calculate those probabilities.

In [16]:

def multi_ticket_probability(n):
    """
    Calculates the probability of buying more than one ticket
    
    Args:
    n (int): Number of tickets being purchased
    """
    prob_dict={1:"With {num_tickets:,} tickets, the probability of you winning is {num_tickets:,} in 13,983,816. Not a great chance!".format(num_tickets=n),
               2:"After spending ${:,}, your chance of winning $1,000,000 is above 7%".format(3.0*n),
               3:"You are about to spend an incredible ${:,} for a 50% chance in winning $1,000,000. Please reconsider!".format(3.0*n),
               4:"Congrats! You will win $1,000,000 after spending ${:,}. This app isn't for you!".format(3.0*n)}
    if n < 1000000:
        print(prob_dict[1])
    elif n < 6991900:
        print(prob_dict[2])
    elif n < 13983816:
        print(prob_dict[3])
    else:
        print(prob_dict[4])

In [17]:

#Test the multi_ticket_probability function
bold_print("Testing the win_chances function:")
multi_ticket_probability(100000)

Testing the win_chances function:

With 100,000 tickets, the probability of you winning is 100,000 in 13,983,816. Not a great chance!

In [18]:

#Test the multi_ticket_probability function
bold_print("Testing the win_chances function:")
multi_ticket_probability(13983818)

Testing the win_chances function:

Congrats! You will win $1,000,000 after spending $41,951,454.0. This app isn't for you!

Using these probabilities are likely to convince addicts to think twice about buying lottery tickets. However there are certain addicts who are still unlikely to be convinced due to the other prizes offered by the lottery. We shall discuss this in more detail.

Index

6¶

I'll win those other prizes..¶

Calculating the probability of winning the other prizes

The 649 lottery has other prizes. A lottery participant is likely to lose the big prize only if all 6 numbers do not match. The lottery does offer prizes if the participants have 5,4,3 or 2 of the drawn numbers albeit with lower prize values.

This is more likely to entice addicts because they are likely to feel that they are not losing as long as there is a chance of winning something.

We shall write a function to calculate the probability of winning prizes when exactly 5, 4, 3 and 2 of 6 numbers drawn are exactly the same as that held by the lottery participant respectively.

In [19]:

def probability_less6(n):
    """
    Calculate the probability of winning based on certain number of the selected numbers being drawn
    
    Args:
    n (int): The number of user selections that have been drawn
    
    Returns:
    probability (float): The number of tickets containing the numbers the user selected
    """
    num_of_successful_outcomes = combinations(6,n)*combinations(43,6-n)
    probability = num_of_successful_outcomes
    return probability

In [20]:

#Test the probability_less6 function
bold_print("Testing the win_chances function:")
print('The probability of winning with exactly {number} numbers is {prob:,} out of 13,983,816 tickets'.format(number=3,prob=int(probability_less6(3)))) 

Testing the win_chances function:

The probability of winning with exactly 3 numbers is 246,820 out of 13,983,816 tickets

These probabilities, hopefully have more impact on a lottery addict to convince them of withdrawing from the habit of buying lottery tickets. It is clear from the above results that the probabilities as we go closer to the big price are significantly reduced.

The last price is one which keeps the habit going. It encourages the user to keep playing the lottery.

Index

7¶

App Ideation¶

Seeing as we were able to create functions and test their outputs, it would be helpful to the engineers to understand how we expect our output to be used in their app. Using wireframes we can generate UIs that could provide the context necessary to use our logic.

A de-addiction app should help in intervening when the user is able to take up the harmful habit. With this in mind the app we built will allow the user to indulge in his/her behavior but with some intervention from the app.

The app is focused on users who are genuinely interested in de-addiction and need an intervention when they indulge in their behavior.

The app we build should allow the user to select the numbers that they want to play and process the payment for the same. However, the app will intervene between when the user selects his numbers and proceed to pay for the lottery.

New%20Mockup%201.png

The logo would load and allow the user to select the number combination for the lotto.

New%20Mockup%201%20copy.png

Once the user has provided the detail, the intervention will kick in. Everytime the user disregards the probabilities and moves on, the app would provide a different more likely scenario such as space accident or death by cold. If the user still disregards those messages, the app will proceed to process the payment for the lottery.

It must be noted that all flows have not been considered above. This is a simple scenario to present how some of the screens might look like.

8¶

Conclusion¶

In this project we were tasked with providing the core logic to medical institute whose aim it is to prevent lottery addicts from engaging in their addiction. We used the 649 lottery data to test our logic and through the process learned about the devious nature of the lottery.

The chances of winning the big price, as we learnt is 1 in 13,983,816. The value of smaller prizes seem to have their value go down as equality of the drawn numbers decrease. We also learnt that these probabilities do not improve as more tickets are bought.

Index

8¶

Learnings¶

Using frozen set as a dict key

Index

9¶

Acknowledgement¶

The focus of this project was to understand the concept of permutation and combination and apply the same in a fictitious scenario for the purpose of learning.
This is a project from Dataquest.

Index