import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
In the video we have discussed that for MAPE metric the best constant prediction is weighted median with weights
wi=∑Nj=11xjxifor each object xi.
This notebook exlpains how to compute weighted median. Let's generate some data first, and then find it's weighted median.
N = 5
x = np.random.randint(low=1, high=100, size=N)
x
array([63, 52, 70, 17, 76])
inv_x = 1.0/x
inv_x
array([ 0.01587302, 0.01923077, 0.01428571, 0.05882353, 0.01315789])
w = inv_x/sum(inv_x)
w
array([ 0.13078104, 0.15844626, 0.11770294, 0.48465916, 0.1084106 ])
argsort
(and not just sort
) since we will need indices later.idxs = np.argsort(w)
sorted_w = w[idxs]
sorted_w
array([ 0.1084106 , 0.11770294, 0.13078104, 0.15844626, 0.48465916])
sorted_w_cumsum = np.cumsum(sorted_w)
plt.plot(sorted_w_cumsum); plt.show()
print ('sorted_w_cumsum: ', sorted_w_cumsum)
sorted_w_cumsum: [ 0.1084106 0.22611354 0.35689458 0.51534084 1. ]
idx = np.where(sorted_w_cumsum>0.5)[0][0]
idx
3
pos = idxs[idx]
x[pos]
52
print('Data: ', x)
print('Sorted data: ', np.sort(x))
print('Weighted median: %d, Median: %d' %(x[pos], np.median(x)))
Data: [63 52 70 17 76] Sorted data: [17 52 63 70 76] Weighted median: 52, Median: 63
Thats it!
If the procedure looks surprising for you, try to do steps 2--5 assuming the weights are wi=1N. That way you will find a simple median (not weighted) of the data.