In this project, we're going to contribute to the development of a mobile app that is meant to help lottery addicts better estimate their chances of winning and, hopefully, to prevent them from this dangerous habit.

We'll focus on the 6/49 lottery, where six numbers are drawn from a set of 49 (from 1 to 49) for each ticket, and a player wins the big prize if the six numbers on their tickets match all the six numbers drawn. Our goal is to create the logical core of the app and build functions that enable users to answer the following questions:

- What is the probability of winning the big prize with a single ticket?
- What is the probability of winning the big prize if we play several/many different tickets?
- What is the probability of having
*exactly*or*at least*2 (or 3, or 4, or 5) winning numbers on a single ticket?

The historical data used in this project comes from the national 6/49 lottery game in Canada. The dataset counts 3,665 drawings, dating from 1982 to 2018.

After creating and testing the functions for different scenarios of participation in the lottery, we found out that for having relatively high chances to win the big prize, we have to buy a number of tickets that costs at least as much as the big prize itself. The probability of having less winning numbers and, hence, winning much smaller prizes is also very low.

Let's start by writing two functions that we'll use often: for calculation factorials and combinations:

In [1]:

```
def factorial(n):
product = 1
for i in range(n):
product *= i+1
return product
def combinations(n,k):
'''Returns the number of combinations for taking k objects from a group of n objects'''
return factorial(n)/(factorial(k)*factorial(n-k))
```

Now, we'll write a function that calculates the probability of winning the big prize for any given ticket.

For the first version of the app, we want players to be able to calculate the probability of winning the big prize with the various numbers they play on a single ticket. The idea is that they input their six numbers from 1 to 49 in the app and receive the probability value in a friendly way, for the people without any probability knowledge to be able to understand.

In [2]:

```
def check_numbers(lst):
'''Check if a list of numbers is not longer than 6, contains only numbers from 1 to 49,
and all the numbers are unique. If everything satisfied, returns a string of numbers
separated by commas.
'''
string = ''
for i in range(len(lst)):
if lst[i]>49!=6 or lst[i]<1!=6 or len(set(lst))!=6:
string = 'You should insert six different numbers in the range from 1 to 49 ðŸ˜ˆ'
return string
if i!=len(lst)-1:
string+=str(lst[i])+ ', '
else:
string+='and '+str(lst[-1])
return string
def one_ticket_probability(lst):
'''Takes in a list of 6 unique numbers from 1 to 49 inclusive
and prints the probability of winning in an easy-to-understand way
'''
string = check_numbers(lst)
if string.startswith('You'):
return string
c = int(combinations(49,6))
p = 1/c*100
msg = (
f'Your chances to win the big prize with the numbers {string} are only {p:.6f}%.\n'
f'That means: \n'
f'1) 1 chance out of {c:,},\n'
f'2) 373 times less probably than becoming a billionaire in general. \n'
f'Probably, you should consider another approach for getting rich ðŸ¤‘'
)
return msg
# Testing the function
tests = [[1,2,3,4,5,6], # correct input
[1,2,3,4,5], # less than 6 numbers
[1,2,3,4,5,1], # repeated numbers
[1,2,3,4,5,100]] # numbers larger than 49 or smaller than 1
for test in tests:
print(one_ticket_probability(test), '\n')
print('___________________________________________________________________________________________\n')
```

Another feature of our app is that it should enable users to compare their tickets against the historical lottery data in Canada and determine whether they would have ever won by now. We're going to write a function to implement this idea, but first, let's open the dataset with the historical data of winning numbers and get familiar with its structure:

In [3]:

```
import pandas as pd
lottery = pd.read_csv('649.csv')
print('Number of rows: ', lottery.shape[0], '\nNumber of columns: ', lottery.shape[1])
lottery.head(3)
```

Out[3]:

In [4]:

```
lottery.tail(3)
```

Out[4]:

In [5]:

```
print('Total number of missing values in `lottery`: ', lottery.isnull().sum().sum())
```

The dataframe contains 11 columns with self-explanatory names, including the columns for each of the six drawn numbers + a bonus number. There are no missing values in the dataframe.

Now, let's write a function for comparing any ticket with the historical data. It's supposed to output the following:

- the number of times the combination selected occurred in the dataset,
- the probability of winning the big prize in the next drawing with that combination.

In [6]:

```
def extract_numbers(row):
'''Takes a row of the lottery dataframe and returns a set containing all the six
winning numbers'''
return set(row[4:10])
# Testing the function
print('First row: ', extract_numbers(lottery.iloc[0]), '\n')
# Extracting all the combinations of winning numbers
win_num = lottery.apply(extract_numbers, axis=1)
print('First 3 winning combinations: \n', win_num.head(3), sep='')
```

In [7]:

```
def check_historical_occurence(lst, hist_data):
'''
Given a list of user's numbers and a Series of historical winning number sets, compares
the list against the Series, outputs information about the number of matches and
the probability of winning the big prize in the next drawing with that combination
'''
string = check_numbers(lst)
if string.startswith('You'):
return string
count=0
if set(lst) in hist_data.values:
count+=1
if count==0:
print(f'Your combination of numbers {string} is absent in the dataset\n')
elif count==1 or count%10==1:
print(f'Your combination of numbers {string} occured {count} time in the dataset\n')
else:
print(f'Your combination of numbers {string} occured {count} times in the dataset\n')
return one_ticket_probability(lst)
# Testing the function
tests = [[3, 41, 11, 12, 43], # less than 6 numbers
[3, 41, 11, 12, 43, 14], # occured combination
[3, 4, 11, 12, 43, 14]] # absent combination
for test in tests:
print(check_historical_occurence(test, win_num))
print('_______________________________________________________________________________________________\n')
```

Lottery addicts usually play more than one ticket on a single drawing, thinking that this might significantly increase their chances of winning. To help them better estimate their chances, we're going to write a function for calculating the probability for any number of different tickets. The idea is that the user can input the number of different tickets they want to play from 1 to 13,983,816 (the maximum number of different tickets, as we saw earlier), and the function will output information about the probability of winning the big prize in that case.

In [8]:

```
def multi_ticket_probability(num):
'''Given a number of tickets played, prints the probability of winning the big prize'''
comb = int(combinations(49,6))
p = num/comb*100
if num<1 or num>comb:
return f'You should input a number between 1 and {comb:,} inclusive ðŸ˜ˆ'
if p<1:
if num==1 or num%10==1:
msg = (
f'Your chances to win the big prize playing {num:,} ticket are only {p:.6f}%.\n'
f'Your chances to spend ${3*num:,} CAD on your tickets are 100%, though ðŸ˜§\n'
)
else:
msg = (
f'Your chances to win the big prize playing {num:,} tickets are only {p:.6f}%.\n'
f'Your chances to spend ${3*num:,} CAD on your tickets are 100%, though ðŸ˜§\n'
)
elif num<1666667:
if num==1 or num%10==1:
msg = (
f'Your chances to win the big prize playing {num:,} ticket are only {round(p)}%.\n'
f'Your chances to spend ${3*num:,} CAD on your tickets are 100%, though ðŸ˜§\n'
)
else:
msg = (
f'Your chances to win the big prize playing {num:,} tickets are only {round(p)}%.\n'
f'Your chances to spend ${3*num:,} CAD on your tickets are 100%, though ðŸ˜§\n'
)
else:
if num%10==1:
msg = (
f'Your chances to win the big prize playing {num:,} ticket are {round(p)}%.\n'
f'However, you\'ll pay for all your tickets at least as much as the big prize itself ðŸ™‰\n'
)
else:
msg = (
f'Your chances to win the big prize playing {num:,} tickets are {round(p)}%.\n'
f'However, you\'ll pay for all your tickets at least as much as the big prize itself ðŸ™‰\n'
)
return msg
# Testing the function
tests = [0,
1,
10000,
139839, # 1/100 of the maximum number of different tickets
1398382, # 1/10 of the maximum number of different tickets
1666667, # a number of tickets that costs as the big prize
6991908, # 1/2 of the maximum number of different tickets
13983816] # the maximum number of different tickets
for test in tests:
print(multi_ticket_probability(test))
print('_____________________________________________________________________________________\n')
```

We can make several observations here:

- To have at least 1% of chances to win the big prize of 5,000,000 CAD (Canadian dollars), we have to buy 139,839 different tickets, each of which cost 3 CAD (you can check this post published on 08.04.2021). Hence, just to have such an insignificant probability, we have to pay 419,517 CAD.
- To have 10% of chances, which are still very low, we have to buy 1,398,382 tickets, paying for them 4,195,146, i.e. 84% of the value of the big prize itself.
- Starting from the number of tickets equal to 1,666,667, the chances increase to 12% and more, up to 100% in case of buying the maximum number of different tickets (13,983,816).
**However**, the amount of money we have to pay for all those tickets increases from the value of the big prize itself (5,000,000 CAD) to about 8 times more than that value, making the whole venture absolutely senseless.

In most 6/49 lotteries, there are smaller prizes if a player's ticket match two, three, four, or five of the six numbers drawn. Hence, the users might be interested in knowing the probability of having exactly two, three, four, or five winning numbers. In this section, we're going to write a function to allow the users to calculate this probability. The concept is that the user inputs their combination of six numbers and the number of winning numbers expected (an integer between 2 and 5), and the app displays information about the probability of having exactly that number of winning numbers. In reality, for our function to work, the specific combination on the ticket is irrelevant behind the scenes, and we only need the number of winning numbers expected.

In [9]:

```
def probability_less_6(num):
'''Takes in an integer between 2 and 5 and prints information about the chances of
having exactly that number of winning numbers
'''
if num<2 or num>5:
return 'You should input a number between 2 and 5 inclusive ðŸ˜ˆ'
comb = int(combinations(6,num))
lottery_outcomes = int(combinations(43, 6 - num))
success_outcomes = comb * lottery_outcomes
tot_outcomes = combinations(49,6)
p = success_outcomes/tot_outcomes*100
if p<1:
return f'Your chances to have exactly {num} winning numbers are only {p:.6f}%'
else:
return f'Your chances to have exactly {num} winning numbers are only {p:.2f}%'
# Testing the function
for test in [1,2,3,4,5]:
print(probability_less_6(test))
print('__________________________________________________________________\n')
```

We see that even the probability of having at least 2 winning numbers (and winning a smaller prize) is rather low, not to mention the other numbers.

Finally, let's consider the case when the user is interested in knowing the probability of having *at least* two, three, four, or five winning numbers. The function here will be similar to the one above, only that the number of successful outcomes for having at least N winning numbers will be the sum of the numbers of successful outcomes for having exactly N winning numbers, N+1, etc., up to and including 6. Again, the specific combination on the ticket doesn't matter under the hood.

In [10]:

```
def probability_at_least_num(num):
'''Takes in an integer between 2 and 5 and prints information about the chances of
having at least that number of winning numbers
'''
if num<2 or num>5:
return 'You should input a number between 2 and 5 inclusive ðŸ˜ˆ'
success_outcomes = 0
for n in range(num,7):
comb = int(combinations(6,n))
lottery_outcomes = int(combinations(43, 6 - n))
success_outcomes += comb * lottery_outcomes
tot_outcomes = combinations(49,6)
p = success_outcomes/tot_outcomes*100
if p<1:
return f'Your chances to have at least {num} winning numbers are only {p:.6f}%'
else:
return f'Your chances to have at least {num} winning numbers are only {p:.3f}%'
# Testing the function
for test in [1,2,3,4,5]:
print(probability_at_least_num(test))
print('__________________________________________________________________\n')
```

Since the probabilities of having an *exact* number of winning numbers were already quite low, summing up the numbers of successful outcomes in order to have *at least* N winning numbers didn't contribute much to increasing the chances.

In this project, we considered different strategies of participating in the 6/49 lottery: playing one vs. several tickets, expecting to win the big prize or a smaller one (in case of having less than 6 winning numbers), using historical data to check if a combination of numbers has ever won before. We created and tested several functions for calculating the probability of winning in all of these scenarios, which are supposed to be used for a mobile app to help players better estimate their chances of winning and, hopefully, to discourage them from playing. Below are our main insights:

- The chances to win the big prize with a single ticket are extremely low.
- To have relatively high chances of winning the big prize, the player has to buy
**a huge amount of tickets**. For having only 12% of the probability, they have to spend on the tickets the sum equal to the big prize itself, making the whole venture totally unreasonable. - The probability of having less winning numbers is still very low even for 2 numbers. Given that other prizes are significantly smaller than the big one, while the price for a ticket remains the same, we have a rather failing strategy also in this case.