Sometimes we would like to find the $z$ values belonging to a particular probability. For this we use an inverse cumulative distribution function.
Because the cumulative distribution function is strictly increasing and continuous, we can find the $z$ by narrowing down with binary bisecting the distance between high and low $z$ values.
The code is based upon the respective example from Data Science from Scratch.
import math as m
import numpy as np
def calc_normal_cdf(x: float, mu: float = 0, sigma: float = 1) -> float:
return (1 + m.erf((x - mu) / m.sqrt(2) / sigma)) / 2
def to_string(number: float, column_width: int = 20) -> str:
# return str(number).ljust(column_width)
return f"{str(number): <{column_width}}"
The inverse normal cumulative distribution function.
Takes a probability and optionaly a mean and standard deviation of the distribution, and a tolerance value.
If the distribution is not standard normal, it calculates $z$ for the standard normal one and then rescales the result
Take a very low and a very high $z$ value
While the distance between the high and low $z$ values are greater than our tolerance
a. Take their midpoint
b. Calculate the probability belonging to that midpoint
c. Based on the position of the probability belonging of the midpoint in respect to the probabilty for which we try to find out the $z$ value, assign the midpoint to the low or high $z$ value
def calc_inverse_normal_cdf(p: float, mu:float = 0, sigma: float = 1, tolerance: float = 1E-5, show_steps=False) -> float:
# In case it is not a standard normal distribution, calculate the standard normal first and then rescale
if mu != 0 or sigma != 1:
return mu + sigma * calc_inverse_normal_cdf(p, tolerance=tolerance)
low_z = -10
hi_z = 10
while hi_z - low_z > tolerance:
mid_z = (low_z + hi_z) / 2
mid_p = calc_normal_cdf(mid_z)
if mid_p < p:
low_z = mid_z
else:
hi_z = mid_z
if show_steps: print("\t".join(map(to_string, [low_z, mid_z, hi_z])))
return mid_z
We can examine the steps as the $z$ closes down to its final value.
calc_inverse_normal_cdf(p=.3, show_steps=True)
-10 0.0 0.0 -5.0 -5.0 0.0 -2.5 -2.5 0.0 -1.25 -1.25 0.0 -0.625 -0.625 0.0 -0.625 -0.3125 -0.3125 -0.625 -0.46875 -0.46875 -0.546875 -0.546875 -0.46875 -0.546875 -0.5078125 -0.5078125 -0.52734375 -0.52734375 -0.5078125 -0.52734375 -0.517578125 -0.517578125 -0.52734375 -0.5224609375 -0.5224609375 -0.52490234375 -0.52490234375 -0.5224609375 -0.52490234375 -0.523681640625 -0.523681640625 -0.52490234375 -0.5242919921875 -0.5242919921875 -0.52459716796875 -0.52459716796875 -0.5242919921875 -0.524444580078125 -0.524444580078125 -0.5242919921875 -0.524444580078125 -0.5243682861328125 -0.5243682861328125 -0.5244064331054688 -0.5244064331054688 -0.5243682861328125 -0.5244064331054688 -0.5243873596191406 -0.5243873596191406 -0.5244064331054688 -0.5243968963623047 -0.5243968963623047
-0.5243968963623047
Finally, we get a number of reference values for some different probabilities
for i in np.arange(0, 1.1, .1):
print(f"{i:.1f}: {calc_inverse_normal_cdf(i) : >20}")
0.0: -9.999990463256836 0.1: -1.2815570831298828 0.2: -0.8416271209716797 0.3: -0.5243968963623047 0.4: -0.2533435821533203 0.5: -9.5367431640625e-06 0.6: 0.2533435821533203 0.7: 0.5243968963623047 0.8: 0.8416271209716797 0.9: 1.2815570831298828 1.0: 8.244009017944336