Notebook

Inverse normal cumulative distribution - finding $z$ values from probabilities with Python¶

Sometimes we would like to find the $z$ values belonging to a particular probability. For this we use an inverse cumulative distribution function.

Because the cumulative distribution function is strictly increasing and continuous, we can find the $z$ by narrowing down with binary bisecting the distance between high and low $z$ values.

The code is based upon the respective example from Data Science from Scratch.

Libraries, helper functions¶

In [1]:

import math as m
import numpy as np

In [2]:

def calc_normal_cdf(x: float, mu: float = 0, sigma: float = 1) -> float:
    return (1 + m.erf((x - mu) / m.sqrt(2) / sigma)) / 2

In [3]:

def to_string(number: float, column_width: int = 20) -> str:
    # return str(number).ljust(column_width)
    return f"{str(number): <{column_width}}"

Inverse normal CDF¶

The inverse normal cumulative distribution function.

Takes a probability and optionaly a mean and standard deviation of the distribution, and a tolerance value.

If the distribution is not standard normal, it calculates $z$ for the standard normal one and then rescales the result
Take a very low and a very high $z$ value
While the distance between the high and low $z$ values are greater than our tolerance

a. Take their midpoint
b. Calculate the probability belonging to that midpoint
c. Based on the position of the probability belonging of the midpoint in respect to the probabilty for which we try to find out the $z$ value, assign the midpoint to the low or high $z$ value

In [4]:

def calc_inverse_normal_cdf(p: float, mu:float = 0, sigma: float = 1, tolerance: float = 1E-5, show_steps=False) -> float:

    # In case it is not a standard normal distribution, calculate the standard normal first and then rescale
    if mu != 0 or sigma != 1:
        return mu + sigma * calc_inverse_normal_cdf(p, tolerance=tolerance)

    low_z = -10
    hi_z = 10

    while hi_z - low_z > tolerance:
        mid_z = (low_z + hi_z) / 2
        mid_p = calc_normal_cdf(mid_z)

        if mid_p < p:
            low_z = mid_z
        else:
            hi_z = mid_z

        if show_steps: print("\t".join(map(to_string, [low_z, mid_z, hi_z])))

    return mid_z

We can examine the steps as the $z$ closes down to its final value.

In [5]:

calc_inverse_normal_cdf(p=.3, show_steps=True)

-10                 	0.0                 	0.0                 
-5.0                	-5.0                	0.0                 
-2.5                	-2.5                	0.0                 
-1.25               	-1.25               	0.0                 
-0.625              	-0.625              	0.0                 
-0.625              	-0.3125             	-0.3125             
-0.625              	-0.46875            	-0.46875            
-0.546875           	-0.546875           	-0.46875            
-0.546875           	-0.5078125          	-0.5078125          
-0.52734375         	-0.52734375         	-0.5078125          
-0.52734375         	-0.517578125        	-0.517578125        
-0.52734375         	-0.5224609375       	-0.5224609375       
-0.52490234375      	-0.52490234375      	-0.5224609375       
-0.52490234375      	-0.523681640625     	-0.523681640625     
-0.52490234375      	-0.5242919921875    	-0.5242919921875    
-0.52459716796875   	-0.52459716796875   	-0.5242919921875    
-0.524444580078125  	-0.524444580078125  	-0.5242919921875    
-0.524444580078125  	-0.5243682861328125 	-0.5243682861328125 
-0.5244064331054688 	-0.5244064331054688 	-0.5243682861328125 
-0.5244064331054688 	-0.5243873596191406 	-0.5243873596191406 
-0.5244064331054688 	-0.5243968963623047 	-0.5243968963623047

Out[5]:

-0.5243968963623047

Finally, we get a number of reference values for some different probabilities

In [6]:

for i in np.arange(0, 1.1, .1):
    print(f"{i:.1f}: {calc_inverse_normal_cdf(i) : >20}")

0.0:   -9.999990463256836
0.1:  -1.2815570831298828
0.2:  -0.8416271209716797
0.3:  -0.5243968963623047
0.4:  -0.2533435821533203
0.5: -9.5367431640625e-06
0.6:   0.2533435821533203
0.7:   0.5243968963623047
0.8:   0.8416271209716797
0.9:   1.2815570831298828
1.0:    8.244009017944336