Homework 6¶

CHE 116: Numerical Methods and Statistics¶

Version 1.0 (2/17/2016)

Misc Distribution Problems (10 Points)

==== Answer symbolically first, indicating what equations your Python program is using, and then compute the answer in Python. If not specified, say which distribution you're assuming.

[1] The time between traffic tickets is exponentially distributed. Based on past experience, you receive a traffic ticket about every 3 years. What's the probability of having a ticket within 12 months? For 2 bonus points, what's the probability of having 2 tickets within 12 months? Use scipy stats.
[2] You see two deer per day. How many days must pass before you have a 99% of having seen a deer? Answer in days, hours, and minutes.
[1] The expected score on a test is 90% with a standard deviation of 15%. You cannot receive more than 100% on this test. What's the probability failing (< 60%)?
[2] Using the above parameters, what's the probability of getting an A (93%-100%)?
[4] Using the definition of expected value, write a for loop that computes the expected value of a binomial distribution with $N = 10$ and $p = 0.3$. Do not use scipy stats. Compre with the fomula $E[x] = pN$ for binomial.

1.1 Answer¶

One ticket: We are being asked:

$$\int_0^{12} \lambda e^{-\lambda t}\, dt$$

where $\lambda = \frac{1}{36}$, based on the prompt. The answer is $0.28$

Two tickets : We can view this as kind of a binomial distribution, where each month we can get a traffic ticket. That would give these parameters:

$$N = 12$$$$ p = \frac{1}{36}$$$$P(x = 2) = \binom{x}{N} p^x \,\left(1 - p\right)^{N - x} = 0.03$$

However, of course it's possible to get two tickets in a month. SO, let's try breaking it down by day:

$$N = 365$$$$ p = \frac{1}{3\times365}$$$$P(x = 2) = \binom{x}{N} p^x \,\left(1 - p\right)^{N - x} = 0.0398$$

Ah, we see that it's approximately the same. To go to the extreme, we can use a Poisson distribution, where $\mu = \frac{1}{3}$. That gives $0.0398$, which is the same. To make sure we're sane, we can check if the two distributions are consistent. This Poisson distribution gives 0.24 (instead of 0.28) for a single ticket. Sort of close.

In [14]:

from scipy import stats as ss

print(ss.expon.cdf(12, scale=36))
print(ss.binom.pmf(2, p=1 / 36, n=12))
print(ss.binom.pmf(2, p=1 / (3 * 365), n=365))
print(ss.poisson.pmf(2, mu=1 / 3))
print(ss.poisson.pmf(1, mu=1 / 3))

0.283468689426
0.0384232741801
0.0397647849595
0.0398072950319
0.238843770191

In [3]:

ss.binom?

1.2 Answer¶

This is an exponential distribution with $\lambda = \frac{2}{24 \times 60}$. We are being aksed

$$P(t > a) = 0.99$$

which is 2 days, 7 hours, and 16 minutes.

In [14]:

result = ss.expon.ppf(0.99, scale = 24 * 60 / 2)
days = int(result / 24 / 60)
hours = int((result / 60 - days * 24))
minutes = (result - days * 24 * 60 - hours * 60)
print(days, hours, minutes)

2 7 15.7225339114

1.3 Answer¶

We are being asked:

$$\int_{-\infty}^{60} \cal{N}(90, 12)$$

which is 2.3%

In [17]:

ss.norm.cdf(60, scale=15, loc=90)

Out[17]:

0.022750131948179195

1.4 Answer¶

We are being asked:

$$\int_{93}^{\infty} \cal{N}(90, 12)$$

The reason it goes to infinity is that many who got 100%, would have gotten higher scores had the test allowed it. So they would be greater than 100% if the test scores truly followed a normal distribution. The answer is 42%.

In [18]:

1 - ss.norm.cdf(93, scale=15, loc=90)

Out[18]:

0.42074029056089701

1.7 Answer¶

The definition of expected value is:

$$E[x] = \sum_0^N x P(x)$$

where $P(x)$ is the binomial distribution for us

In [28]:

from scipy.special import comb

sum = 0
N = 10
p = 0.3

for i in range(0,N+1):
    sum += i * comb(N, i) * p**i * (1 - p)**(N - i)
print(sum, p * N)

3.0 3.0

CLT Theory (4 Points)

==== Indicate if the CLT applies with yes or no. If no, state why.

You measure the density of a solution 50 times and take the average
You sum the number of students who attended the last 20 lectures
Flip a coin 25 times and consider a heads 0 and a tails 1 and take the average
Your final grade, which is the average of 25 homeworks, tests, and a project

2.1 Answer¶

CLT applies. Enough samples and you took an average

2.2 Answer¶

CLT applies. It's proportional to an average (you didn't divide by 20)

2.4 Answer¶

CLT applies

2.5 Answer¶

CLT applies. You have 27 rvs averaged together

Confidence Intervals (12 Points)

==== Report the given confidence interval for error in the mean using the data in the next cell and describe in words what the confidence interval is for each example

80% Double
99% Upper ( a value such that the mean lies above that value 99% of the time)
95% Double
Redo part 3 with a known standard deviation of 2

In [2]:

data_3_1 = [93.14,94.66, 102.1, 79.98, 96.85, 106.79, 101.92, 91.99, 97.22, 99.1, 88.7, 123.66, 99.7, 115.03, 99.28, 114.59, 102.25, 88.4, 111.06, 75.19, 107.32, 81.21, 100.49, 109.04, 105.09, 96.17, 78.13, 98.37, 104.47, 95.41]
data_3_2 = [2.24,3.86, 2.19, 1.5, 2.34, 2.55, 1.8, 3.99, 2.64, 3.8]
data_3_3 = [53.43,50.49, 52.55, 51.73]

3.1 Answer¶

This is a normal distribution because $N > 25$. Our interval will run from 10% to 90% to have an area of 80%

In [10]:

import numpy as np
from scipy import stats as ss

sample_mean = np.mean(data_3_1)
sample_std = np.std(data_3_1, ddof=1)
Zlo = ss.norm.ppf(0.1)
Xlo = Zlo * sample_std / np.sqrt(len(data_3_1)) + sample_mean
Xhi = -Zlo * sample_std / np.sqrt(len(data_3_1)) + sample_mean

print(Xlo, 'to', Xhi)

95.9671913504 to 101.18680865

3.2 Answer¶

This is a t distribution because $N < 25$. Our interval will run from 1% to 100% to have an area of 99%

In [11]:

sample_mean = np.mean(data_3_2)
sample_std = np.std(data_3_2, ddof=1)
Tlo = ss.t.ppf(0.01, len(data_3_2) - 1)
Xlo = Tlo * sample_std /np.sqrt(len(data_3_2)) + sample_mean
print(Xlo)

1.91493832637

3.3 Answer¶

This is a t distribution because $N < 25$. Our interval will run from 2.5% to 97.5% to have an area of 95%

In [12]:

sample_mean = np.mean(data_3_3)
sample_std = np.std(data_3_3, ddof=1)
Tlo = ss.t.ppf(0.025, len(data_3_3) - 1)
Xlo = Tlo * sample_std / np.sqrt(len(data_3_3)) + sample_mean
Xhi = -Tlo * sample_std / np.sqrt(len(data_3_3)) + sample_mean
print(Xlo, 'to', Xhi)

50.3141851129 to 53.7858148871

3.4 Answer¶

We can now use a normal distribution because we know the true standard deviation

In [13]:

sample_mean = np.mean(data_3_3)
true_std = 2
Zlo = ss.norm.ppf(0.025)
Xlo = Zlo * true_std / np.sqrt(len(data_3_3)) + sample_mean
Xhi = -Zlo * true_std / np.sqrt(len(data_3_3)) + sample_mean
print(Xlo, 'to', Xhi)

50.0900360155 to 54.0099639845

Sample Statistics (17 Points)

=====

Answer the following questions using the data given in the next cell.

[1] What is the sample correlation between $X$ and $Y$?
[5] Write your own method to compute sample covariance using a for loop. Use the code in the second cell below as a starting point. Do not use any numpy methods except to check your answer. Hint: You will need to use two loops
[5 + 1] What is the median of $Y$? Use Python and Numpy. Be careful if you use the sort method, since it will permanently alter your $Y$ array.

In [23]:

X = [1.6,0.4, -1.05, -0.08, 0.99, -1.89, 0.29, 0.71, -0.47, 1.15]
Y = [3.59,1.49, -2.57, -0.0, 2.0, -3.48, 0.14, 1.38, -1.48, 2.6]

In [31]:

for xi, yi in zip(X, Y): 
    print(xi, yi)

1.6 3.59
0.4 1.49
-1.05 -2.57
-0.08 -0.0
0.99 2.0
-1.89 -3.48
0.29 0.14
0.71 1.38
-0.47 -1.48
1.15 2.6

4.1 Answer¶

$$r_{xy} = -0.16$$

In [24]:

ans = np.corrcoef(X, Y, ddof=1)
print(ans[0,1])

0.985277421286

4.2 Answer¶

In [36]:

#compute the mean first
xmean = 0
ymean = 0
for xi, yi in zip(X, Y): 
    xmean += xi
    ymean += yi
    
xmean /= len(X)
ymean /= len(Y)

#now compute covariance using our previous calculation.
cov = 0
for xi, yi in zip(X, Y): 
    cov += (xi - xmean) * (yi - ymean)
cov /= len(X) - 1

print(cov)

2.4106833333333326

4.3 Answer¶

It is an even number, so there is no center element

In [29]:

from math import *

YSort = Y[:]
YSort.sort()
N = len(YSort)
print((YSort[int(N / 2)] + YSort[int(N / 2 - 1)]) / 2)

0.76

In [ ]: