==== Answer symbolically first, indicating what equations your Python program is using, and then compute the answer in Python. If not specified, say which distribution you're assuming.
[1] The time between traffic tickets is exponentially distributed. Based on past experience, you receive a traffic ticket about every 3 years. What's the probability of having a ticket within 12 months? For 2 bonus points, what's the probability of having 2 tickets within 12 months? Use scipy stats.
[2] You see two deer per day. How many days must pass before you have a 99% of having seen a deer? Answer in days, hours, and minutes.
[1] The expected score on a test is 90% with a standard deviation of 15%. You cannot receive more than 100% on this test. What's the probability failing (< 60%)?
[2] Using the above parameters, what's the probability of getting an A (93%-100%)?
[4] Using the definition of expected value, write a for loop that computes the expected value of a binomial distribution with $N = 10$ and $p = 0.3$. Do not use scipy stats. Compre with the fomula $E[x] = pN$ for binomial.
One ticket: We are being asked:
$$\int_0^{12} \lambda e^{-\lambda t}\, dt$$where $\lambda = \frac{1}{36}$, based on the prompt. The answer is $0.28$
Two tickets : We can view this as kind of a binomial distribution, where each month we can get a traffic ticket. That would give these parameters:
$$N = 12$$$$ p = \frac{1}{36}$$$$P(x = 2) = \binom{x}{N} p^x \,\left(1 - p\right)^{N - x} = 0.03$$However, of course it's possible to get two tickets in a month. SO, let's try breaking it down by day:
$$N = 365$$$$ p = \frac{1}{3\times365}$$$$P(x = 2) = \binom{x}{N} p^x \,\left(1 - p\right)^{N - x} = 0.0398$$Ah, we see that it's approximately the same. To go to the extreme, we can use a Poisson distribution, where $\mu = \frac{1}{3}$. That gives $0.0398$, which is the same. To make sure we're sane, we can check if the two distributions are consistent. This Poisson distribution gives 0.24 (instead of 0.28) for a single ticket. Sort of close.
from scipy import stats as ss
print(ss.expon.cdf(12, scale=36))
print(ss.binom.pmf(2, p=1 / 36, n=12))
print(ss.binom.pmf(2, p=1 / (3 * 365), n=365))
print(ss.poisson.pmf(2, mu=1 / 3))
print(ss.poisson.pmf(1, mu=1 / 3))
0.283468689426 0.0384232741801 0.0397647849595 0.0398072950319 0.238843770191
ss.binom?
This is an exponential distribution with $\lambda = \frac{2}{24 \times 60}$. We are being aksed
$$P(t > a) = 0.99$$which is 2 days, 7 hours, and 16 minutes.
result = ss.expon.ppf(0.99, scale = 24 * 60 / 2)
days = int(result / 24 / 60)
hours = int((result / 60 - days * 24))
minutes = (result - days * 24 * 60 - hours * 60)
print(days, hours, minutes)
2 7 15.7225339114
ss.norm.cdf(60, scale=15, loc=90)
0.022750131948179195
We are being asked:
$$\int_{93}^{\infty} \cal{N}(90, 12)$$The reason it goes to infinity is that many who got 100%, would have gotten higher scores had the test allowed it. So they would be greater than 100% if the test scores truly followed a normal distribution. The answer is 42%.
1 - ss.norm.cdf(93, scale=15, loc=90)
0.42074029056089701
The definition of expected value is:
$$E[x] = \sum_0^N x P(x)$$where $P(x)$ is the binomial distribution for us
from scipy.special import comb
sum = 0
N = 10
p = 0.3
for i in range(0,N+1):
sum += i * comb(N, i) * p**i * (1 - p)**(N - i)
print(sum, p * N)
3.0 3.0
==== Indicate if the CLT applies with yes or no. If no, state why.
==== Report the given confidence interval for error in the mean using the data in the next cell and describe in words what the confidence interval is for each example
data_3_1 = [93.14,94.66, 102.1, 79.98, 96.85, 106.79, 101.92, 91.99, 97.22, 99.1, 88.7, 123.66, 99.7, 115.03, 99.28, 114.59, 102.25, 88.4, 111.06, 75.19, 107.32, 81.21, 100.49, 109.04, 105.09, 96.17, 78.13, 98.37, 104.47, 95.41]
data_3_2 = [2.24,3.86, 2.19, 1.5, 2.34, 2.55, 1.8, 3.99, 2.64, 3.8]
data_3_3 = [53.43,50.49, 52.55, 51.73]
This is a normal distribution because $N > 25$. Our interval will run from 10% to 90% to have an area of 80%
import numpy as np
from scipy import stats as ss
sample_mean = np.mean(data_3_1)
sample_std = np.std(data_3_1, ddof=1)
Zlo = ss.norm.ppf(0.1)
Xlo = Zlo * sample_std / np.sqrt(len(data_3_1)) + sample_mean
Xhi = -Zlo * sample_std / np.sqrt(len(data_3_1)) + sample_mean
print(Xlo, 'to', Xhi)
95.9671913504 to 101.18680865
This is a t distribution because $N < 25$. Our interval will run from 1% to 100% to have an area of 99%
sample_mean = np.mean(data_3_2)
sample_std = np.std(data_3_2, ddof=1)
Tlo = ss.t.ppf(0.01, len(data_3_2) - 1)
Xlo = Tlo * sample_std /np.sqrt(len(data_3_2)) + sample_mean
print(Xlo)
1.91493832637
This is a t distribution because $N < 25$. Our interval will run from 2.5% to 97.5% to have an area of 95%
sample_mean = np.mean(data_3_3)
sample_std = np.std(data_3_3, ddof=1)
Tlo = ss.t.ppf(0.025, len(data_3_3) - 1)
Xlo = Tlo * sample_std / np.sqrt(len(data_3_3)) + sample_mean
Xhi = -Tlo * sample_std / np.sqrt(len(data_3_3)) + sample_mean
print(Xlo, 'to', Xhi)
50.3141851129 to 53.7858148871
We can now use a normal distribution because we know the true standard deviation
sample_mean = np.mean(data_3_3)
true_std = 2
Zlo = ss.norm.ppf(0.025)
Xlo = Zlo * true_std / np.sqrt(len(data_3_3)) + sample_mean
Xhi = -Zlo * true_std / np.sqrt(len(data_3_3)) + sample_mean
print(Xlo, 'to', Xhi)
50.0900360155 to 54.0099639845
=====
Answer the following questions using the data given in the next cell.
X = [1.6,0.4, -1.05, -0.08, 0.99, -1.89, 0.29, 0.71, -0.47, 1.15]
Y = [3.59,1.49, -2.57, -0.0, 2.0, -3.48, 0.14, 1.38, -1.48, 2.6]
for xi, yi in zip(X, Y):
print(xi, yi)
1.6 3.59 0.4 1.49 -1.05 -2.57 -0.08 -0.0 0.99 2.0 -1.89 -3.48 0.29 0.14 0.71 1.38 -0.47 -1.48 1.15 2.6
ans = np.corrcoef(X, Y, ddof=1)
print(ans[0,1])
0.985277421286
#compute the mean first
xmean = 0
ymean = 0
for xi, yi in zip(X, Y):
xmean += xi
ymean += yi
xmean /= len(X)
ymean /= len(Y)
#now compute covariance using our previous calculation.
cov = 0
for xi, yi in zip(X, Y):
cov += (xi - xmean) * (yi - ymean)
cov /= len(X) - 1
print(cov)
2.4106833333333326
It is an even number, so there is no center element
from math import *
YSort = Y[:]
YSort.sort()
N = len(YSort)
print((YSort[int(N / 2)] + YSort[int(N / 2 - 1)]) / 2)
0.76