Programming Exercise 8: Anomaly Detection and Recommender Systems¶

Author - Rishabh Jain¶

In [1]:

import warnings,os
warnings.simplefilter('ignore')

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
%matplotlib inline

from scipy.io import loadmat

1 Anomaly detection¶

Problem Statement¶

In this part of the exercise, we will implement an anomaly detection algorithm to detect the anomalous behavior in server computers. The features measure the throughput (mb/s) and latency (ms) of response of each server. While our servers were operating, we collected m=307 examples of how they were behaving, and thus have an unlabeled dataset. We suspect that the vast majority of these examples are "Normal" (Non-anomalous) examples of servers operating normally, but there might also be some examples of servers acting anomalously within the dataset.

We will use a gaussian model to detect the anomalies in our dataset. On this dataset, we will fit a gaussian distribution and then find the values that have very low probability and hence can be considered anomalies.

In [2]:

# Loading dataset
mat=loadmat('ex8data1.mat')
X=mat['X']
print(*mat.keys(),sep='\n')

__header__
__version__
__globals__
X
Xval
yval

In [3]:

fig=sns.scatterplot(X[:,0],X[:,1],marker='x')
fig.set(xlabel='Latency (ms)',ylabel='Throughput (mb/s)',title='Original Dataset');

1.1 Gaussian Distribution¶

To perform anomaly detection, we will first need to fit a model to the data's distribution.
Given a training set, we want to estimate the gaussian distribution for each of the features. For each feature, we need to find the parameters $\mu_{i}$ and $\sigma_{i}^{2}$ that fit the data. The gaussian distribution is given by:

$$\boxed{p(x_{j};\mu_{j},\sigma_{j}^{2})=\frac{1}{\sqrt{2\pi\sigma_{j}^{2}}}e^{-\frac{(X_{j}-\mu_{j})^{2}}{2\sigma_{j}^{2}}}}$$$$p(x)=\Pi_{j=1}^{n}p(x_{j};\mu_{j},\sigma_{j}^{2})$$$$\text{Anomaly, if }p(x)< \varepsilon$$

In [4]:

def mutlivariateGaussian(X,mu,sigma):
    p=(1/(np.sqrt(2*np.pi*np.power(sigma,2))))*np.exp(-np.power(X-mu,2)/(2*np.power(sigma,2)))
    p=np.product(p,axis=1).reshape((-1,1))
    return p

1.2 Estimating parameters for a Gaussian¶

We can estimate the parameters ($\mu_{i},\sigma_{i}^{2}$) of the i-th feature by using the following equations. To estimate the mean, we will use $$\mu_{i}=\frac{1}{m}\sum_{j=1}^{m}x_{i}^{(j)}$$

And for the variance, we will use: $$\sigma_{i}^{2}=\frac{1}{m}\sum_{j=1}^{m}(x_{i}^{(j)}-\mu_{i})^2$$

In [5]:

def estimateGaussian(X):
    mu=np.mean(X,axis=0)
    sigma=np.std(X,axis=0)
    return (mu,sigma)

Let's look at the Gaussian distribution contours of the distribution fit to the dataset.

In [6]:

def plotGaussianContour(X,mu,sigma,title=None):
    # Constructing grid around the min and max range of original data
    temp=np.linspace(np.min(X)-5,np.max(X)+5)
    [x1,x2]=np.meshgrid(temp,temp)
    # Computing the Gaussian Density Probability for the grid
    temp=np.array([x1.reshape(-1),x2.reshape(-1)]).T
    z=mutlivariateGaussian(temp,mu,sigma).reshape(x1.shape)
    # Plotting
    levels=np.power(10,list(map(lambda x:float(x),list(range(-20,0,3)))))
    plt.contour(x1,x2,z,levels=levels);
    fig=sns.scatterplot(X[:,0],X[:,1],marker='x')
    title='Gaussian Distribution Contours' if title==None else title
    fig.set(xlabel='Latency (ms)',ylabel='Throughput (mb/s)',title=title);

In [7]:

# Estimating Gaussian parameters
mu,sigma=estimateGaussian(X)
# Computing Density Probability
p=mutlivariateGaussian(X,mu,sigma)
# Plotting
plotGaussianContour(X,mu,sigma);

1.3 Selecting the threshold, $\varepsilon$¶

Now that we have estimated the gaussian parameters, we can investigate which examples have a very high probability given this distribution and which examples have a very low probability. The low probability examples are more likely to be the anomalies in our dataset. One way to determine which examples are the anomalies is to select the threshold based on a cross validation set. In this part of the exericse, we will implement an algorithm to select the threshold using F1 score on a cross validation set.

$y=1$ corresponds to an anomalous examples and $y=0$ corresponds to a normal examples.

In [8]:

Xval=mat['Xval']
yval=mat['yval']

For many different values of $\varepsilon$, we will compute the resulting $F_1$ score by computing how many examples the current threshold classifies correctly and incorrectly. The $F_1$ score is computed using Precision (prec) and Recall (rec):

$$\boxed{F1=\frac{2.prec.rec}{prec+rec}}$$

We can compute Precision (prec) and Recall (rec) using:

$$\boxed{prec=\frac{tp}{tp+fp}}$$$$\boxed{rec=\frac{tp}{tp+fn}}$$

where,

$tp$ is the number of true positives: the ground truth label says it's an anomaly and our algorithm classified it as an anomaly.
$fp$ is the number of false positives: the ground truth label says it's not an anomaly and our algorithm classified it as an anomaly.
$fn$ is the number of false negatives: the ground truth labels says it's an anomaly and our alogrithm classified it as not being anomalous.

In [9]:

def selectThreshold(yval,pval):
    '''
     Returns the threshold value of epsilon and the corresponding F1 score.
    '''
    bestEpsilon=bestF1=F1=0
    epsilons=np.linspace(np.min(pval),np.max(pval),1000)
    for epsilon in epsilons:
        cvPrediction=pval<epsilon
        tp=np.sum((cvPrediction==1) & (yval==1))
        fp=np.sum((cvPrediction==1) & (yval==0))
        fn=np.sum((cvPrediction==0) & (yval==1))
        prec=rec=0
        # Computing Precision
        if tp+fp>0:
            prec=tp/(tp+fp)
        # Computing Recall
        if tp+fn>0:
            rec=tp/(tp+fn)
        # Computing F1 Score
        if prec+rec>0:
            F1=(2*prec*rec)/(prec+rec)
        if F1>bestF1:
            bestF1=F1
            bestEpsilon=epsilon
    return (bestEpsilon,bestF1)

In [10]:

# Using sample mean and std. to compute density probability for cross validation set
pval=mutlivariateGaussian(Xval,mu,sigma)
# Calculating the threshold based on the best F1 Score
epsilon,f1=selectThreshold(yval,pval)
# Getting indices for anomalous examples
indices=np.where((p<epsilon)==True)[0]
# Plotting anomalies
title=f'Threshold : {epsilon:.5f} & F1 Score : {f1:.2f}'
plotGaussianContour(X,mu,sigma,title=title)
sns.scatterplot(X[indices,0],X[indices,1],marker='o',facecolors='none',edgecolor='r',label='Anomalies');

1.4 High Dimensional Dataset¶

In this dataset, each example is described by 11 features, caputring many more properties of our compute servers.

In [11]:

# Loading dataset
mat=loadmat('ex8data2.mat')
X=mat['X']
print(*mat.keys(),sep='\n')

__header__
__version__
__globals__
X
Xval
yval

In [12]:

pd.DataFrame(X).sample(5)

Out[12]:

	0	1	2	3	4	5	6	7	8	9	10
863	-3.357801	-6.063495	19.827681	-2.843608	-6.472805	13.165775	-1.301804	11.396985	-11.586898	3.164643	17.165776
377	2.541202	1.295366	2.816464	-18.192452	-5.445314	17.679813	-15.325850	9.203726	-3.411643	-4.398754	14.293232
28	20.506733	-1.746643	21.146559	-27.537185	-9.624690	24.728547	-18.732147	8.330862	-0.397158	9.197245	9.742638
369	7.149015	-7.735763	6.911932	-13.206095	-30.845135	-3.004759	2.893837	-1.338351	-10.686856	-12.007350	7.673190
52	10.440863	-20.122496	8.223160	-2.154362	12.276639	-4.383862	-11.543852	3.127246	-0.237430	-1.694638	4.206813

Principal Component Analysis (Optional)¶

In [13]:

# Centering and scaling the data matrix
xPCA=(X-X.mean(axis=0))/X.std(axis=0)
# Computing Covariance
C=np.cov(xPCA.T)
# Computing eigenvalues and eigenvectors
eigenValues,eigenVectors=np.linalg.eig(C)
# Sorting eigenvalues in descending order
idx=eigenValues.argsort()[::-1]
eigenValues=eigenValues[idx]
eigenVectors=eigenVectors[idx]

# Plotting elbow curve
chart=sns.lineplot(list(range(len(eigenValues))),np.cumsum(eigenValues));
chart.set(xlabel='Eigen Values',ylabel='Cumulative sum');

Looking at the graph above, we can conclude that most of the singular values matter for this dataset.

Finding Anomalies¶

In [14]:

# Computing Gaussian Parameters
mu,sigma=estimateGaussian(X)
p=mutlivariateGaussian(X,mu,sigma)

In [15]:

# Using Cross Validation set to select a threshold
Xval=mat['Xval']
yval=mat['yval']
# Computing density probability
pval=mutlivariateGaussian(Xval,mu,sigma)
epsilon,f1=selectThreshold(yval,pval)
print(f'Epsilon : {epsilon}')
print(f'F1 Score : {f1}')

Epsilon : 1.3786074982000233e-18
F1 Score : 0.6153846153846154

Found 117 Anomalies

In [16]:

indices=np.where((p<epsilon)==True)[0]
pd.DataFrame(X[indices])

Out[16]:

	0	1	2	3	4	5	6	7	8	9	10
0	15.107877	-16.430766	19.707360	-19.811888	-1.644537	-6.444184	-6.121214	7.042283	7.232476	17.223479	-2.956852
1	12.411706	3.150959	14.827734	-10.482672	-8.250082	-7.982698	-20.766918	30.689476	4.622547	12.234751	23.167294
2	20.946754	1.048170	8.296324	-2.595042	-14.061970	8.776611	-21.886068	9.769264	-20.071130	13.871906	4.197881
3	5.127033	-18.517137	11.422480	-28.993408	-5.797960	-15.989215	-12.039753	31.296681	8.203208	8.035668	19.519680
4	11.622006	-0.229723	10.005823	-9.700729	-14.765493	26.933578	-9.299882	13.109171	8.976218	8.157170	18.663064
5	-5.689178	-9.707423	22.258180	7.890772	-11.397641	34.445407	-9.836411	-0.179841	-13.261052	15.200689	7.595782
6	16.452993	-15.320026	17.283979	-5.896654	-2.940964	12.148073	-3.422623	15.684343	-20.903409	-0.306900	-8.481797
7	-1.907518	-10.191011	5.041437	-31.539885	0.761368	2.671201	2.149600	11.920996	-10.353145	-5.472842	-7.579770
8	-2.665572	-1.793277	-3.312666	-11.771151	-22.948824	0.657702	4.040898	2.684116	-4.217029	-12.993472	6.657632
9	-2.120839	-22.573352	10.638703	-2.064867	0.942871	-11.136318	1.747898	22.794201	-3.726747	4.053987	-1.185641
10	-12.692722	-11.498905	0.026479	-8.809048	-16.233664	-6.701840	-10.635467	-8.274253	-12.428766	-2.396440	9.134138
11	14.552377	-38.593613	19.161298	-11.379035	-24.679817	29.394102	-10.255991	4.409564	-20.899036	-6.997668	2.424330
12	3.968173	-0.948445	16.390532	4.925583	-30.252147	14.208280	-11.775899	14.966458	-10.949119	0.967746	22.231068
13	21.302835	-22.478786	22.219274	-10.333549	-15.153760	22.316417	12.582181	-0.541537	-9.436337	10.168542	7.909259
14	12.981934	8.319031	19.570986	-17.000717	-23.493794	16.981319	9.996739	20.386708	3.952619	-4.310215	6.554124
15	-14.696577	-17.209237	14.525584	-20.475409	6.159271	12.879206	-13.942417	4.008424	-5.974088	17.808364	13.138762
16	6.104585	3.181527	-3.318357	-23.428609	-14.559891	20.376025	-8.434354	6.850325	-17.518661	-4.584083	5.415755
17	18.829449	1.171611	-0.177123	-3.604261	-3.413298	19.523349	0.903542	18.084633	-11.911412	-7.921118	18.581649
18	8.511318	-16.107225	29.176976	-21.298302	-29.855874	0.995455	-7.266977	8.159963	0.985609	7.729459	-12.966527
19	-5.513919	-6.838657	4.026831	-7.059106	0.967883	15.205973	4.947855	5.362446	-20.477818	11.958153	16.785018
20	19.861932	-7.857506	5.643955	-4.439531	-0.846037	26.410503	-28.741808	13.098772	-14.189061	-2.010233	10.575248
21	-10.373139	-33.577658	14.393455	-3.373885	-12.078383	1.415410	-6.675364	-7.337178	-4.686000	3.700622	0.949668
22	6.766758	-4.407579	10.305895	-4.400896	6.754287	23.584760	14.644711	16.288997	-17.195773	18.158690	2.366239
23	1.081479	-12.690507	31.537406	9.477382	-15.676929	1.106263	-10.324035	4.418272	-19.267674	6.174647	5.671273
24	-15.600509	-20.351230	18.334728	10.487489	-16.037635	1.283141	-8.879621	12.140937	4.844813	2.226241	-4.970682
25	6.075045	-23.857709	11.693187	4.866555	-11.308274	-14.698708	-6.384952	-10.841639	-14.239166	-9.703491	4.654031
26	17.438743	-12.615899	0.789213	1.746310	-13.324761	-4.484684	-8.728469	26.461376	-13.228487	10.545975	8.663423
27	-0.345422	-9.868085	26.251767	-25.748620	-22.168480	0.075066	-4.902324	2.772623	5.538813	-10.652863	12.019222
28	8.782745	-11.039724	19.792848	-7.454397	-11.136051	-5.493029	-26.636752	14.116266	-9.001010	12.182347	21.790293
29	1.815232	6.536839	15.958974	6.231871	5.501488	-4.844246	1.442609	26.645073	-15.078784	-14.621224	11.307183
...	...	...	...	...	...	...	...	...	...	...	...
87	-7.107823	-2.659451	-9.913776	7.865090	-15.062972	-6.669139	-10.886804	-5.428388	-6.145721	-1.694927	36.544551
88	14.410571	-22.477726	3.857208	-2.952726	-24.880941	26.884771	-10.301260	14.843514	-5.629550	15.834725	14.923468
89	2.024200	-17.995892	18.120255	9.302085	-20.657652	-1.342614	-18.167920	-6.761627	1.510516	-0.822743	12.133360
90	24.161473	2.150718	22.656128	-13.834454	4.338249	-5.126474	-6.119224	3.414971	1.066510	9.997343	27.965546
91	-11.113160	-5.647850	23.260981	4.848569	-6.399478	18.210016	13.164300	19.956570	-0.345878	9.654331	10.193919
92	-11.316527	-8.600399	25.029113	-2.779305	-2.414419	18.946212	2.166333	-5.778113	0.044183	4.849727	22.058928
93	10.463708	-21.580878	23.946622	7.302745	-22.052180	0.784292	-7.572949	6.403820	-1.567384	7.096635	21.298699
94	-2.731858	-1.160324	-1.203266	-9.844513	-8.309690	27.409155	8.013834	10.136497	-16.946208	5.646174	10.622073
95	-0.343405	-2.980590	3.574230	10.676147	-3.662173	2.813027	-21.867923	-6.557764	-2.575985	3.411594	16.251755
96	-1.741997	-1.487631	27.069244	-7.585768	2.936218	24.616908	2.101370	41.978198	-0.526588	8.414796	-1.203203
97	-9.546285	-5.598474	16.340206	-0.153478	-14.982139	2.475574	-14.791769	25.655002	2.786267	-5.184959	18.274686
98	12.453219	-4.065051	17.536215	7.543518	8.571374	20.703939	-18.256704	7.374196	-10.945239	0.359840	-12.650669
99	13.419761	7.410299	23.401764	-17.150200	-8.987999	16.752096	-7.503705	20.856191	1.770926	17.369485	14.560195
100	-8.993098	-24.137958	8.507993	-28.898992	-4.012547	-2.715727	-4.027505	6.212865	-15.777059	8.403863	-1.090873
101	7.663754	-21.315959	10.085647	-13.476459	1.869769	31.556190	-4.207739	6.783527	-9.069698	25.711498	4.205376
102	-9.653924	-10.097966	6.088028	-17.224249	-35.351822	2.677276	-13.815382	10.687653	-10.804356	5.571192	-9.528244
103	5.591235	6.707069	25.261202	7.251960	-15.335947	6.264382	-1.493672	24.244968	-4.643058	13.628094	9.853019
104	5.349885	-12.919083	12.132417	-30.097538	-23.687714	15.293885	-7.954287	16.382830	-2.277219	-12.926472	25.837356
105	11.459300	-6.873557	24.064825	0.688340	-14.256252	-14.539024	-26.296003	18.508660	-6.114729	15.668279	0.323248
106	-3.840912	-12.800335	1.341582	-5.303823	-11.195030	-0.078478	0.028865	-8.083519	-21.344629	10.631312	-8.177534
107	12.687826	-7.989794	15.536974	6.641290	1.062005	33.352865	6.122964	17.815628	-17.525844	-2.721404	3.446680
108	0.117894	-24.108314	25.600026	-2.184434	-26.406851	9.524311	-10.713197	-1.100855	-7.452232	-16.072449	-1.506024
109	-14.471792	-12.099130	17.744803	-5.173237	-13.187795	-3.559686	-6.379751	-5.804498	-10.552952	21.174772	4.976100
110	-8.591807	-7.044647	8.214113	4.041871	-13.361453	8.930821	6.097335	-11.421545	-5.723956	12.705342	23.809446
111	-11.788266	-11.483627	12.438678	-3.162811	0.121983	15.905273	-14.988100	15.049161	-19.249455	-8.949271	16.943905
112	-7.514689	-11.633082	12.714199	-9.659930	-20.170026	46.574569	-10.498804	20.522685	-1.223742	4.478910	-0.186369
113	8.610956	-2.087595	13.609762	-5.509457	4.565548	-2.150133	6.652794	6.604923	-0.554557	-21.551409	5.522046
114	4.712890	-5.468014	7.463785	15.156009	-4.815854	-10.092629	3.830391	-0.360002	-11.221351	-3.262486	1.591322
115	-12.120077	-5.845913	28.951978	-10.215184	-18.070297	15.439734	-4.961338	31.429439	-7.406423	5.847000	20.797016
116	3.059507	-5.926298	31.837423	0.245340	10.050776	-1.997428	2.059449	12.571624	4.110159	14.815644	9.035330

117 rows × 11 columns

2 Recommender Systems¶

Problem Statement¶

In this part of the exercise ,we will implement the Collaborative Filtering learnign algorithm and apply it to a dataset of movie ratings. This dataset consists of ratings on a scale of 1 to 5. This dataset has $n_u=943$ users, amd $n_m=1682$ movies. We will first write a cost function that computes the collaborative filtering objective function and gradient. After implmenting that, we will use gradient descent to learn the parameters for collaborative filtering.
The objective of collaborative filtering is to predict movie ratings for that movies that users have not yet rated.

2.1 Movie ratings dataset¶

The matrix $Y$ (num_movies X num_users) stores the ratings $y^{(i,j)}$ from 1 to 5. The matrix $R$ is an binary valued indicator matrix where $R(i,j)=1$ if user j gave a rating to movie i, and $R(i,j)=0$ otherwise.

2.2 Collaborative filtering learning algorithm¶

The collaborative filtering algorithm in the setting of movie recommendation considers a set of $n$ dimensional parameter vectors $x^{(1)},...,x^{(n_m)}$ and $\theta^{(1)},...,\theta^{(n_u)}$, where the model predicts the rating for movie $i$ by user $j$ as $y^{(i,j)}=(\theta^{(j)})^Tx^{(i)}$. Given a dataset that consists of a set of ratings produced by some users on some movies, we wish to learn the parameter vectors that produce the best fit.

2.2.1/2.2.3 Collaborative filtering cost function/Regularized cost function¶

Given $x^{(1)},...,x^{(n_m)}$, estimate $\theta^{(1)},...,\theta^{(n_u)}$ : $$\min_{\theta^{(1)},...,\theta^{(n_u)}} \frac{1}{2}\sum_{j=1}^{n_u}\sum_{i:r(i,j)=1}((\theta^{(j)})^Tx^{(i)}-y^{(i,j)})^2+\frac{\lambda}{2}\sum_{j=1}^{n_u}\sum_{k=1}^n(\theta_k^{(j)})^2$$

Given $\theta^{(1)},...,\theta^{(n_u)}$, estimate $x^{(1)},...,x^{(n_m)}$ : $$\min_{x^{(1)},...,x^{(n_m)}} \frac{1}{2}\sum_{i=1}^{n_m}\sum_{j:r(i,j)=1}((\theta^{(j)})^Tx^{(i)}-y^{(i,j)})^2+\frac{\lambda}{2}\sum_{i=1}^{n_m}\sum_{k=1}^n(x_k^{(i)})^2$$

Minimizing $x^{(1)},...,x^{(n_m)}$ and $\theta^{(1)},...,\theta^{(n_u)}$ simultaneously:

$$\boxed{J(x^{(1)},...,x^{(n_m)},\theta^{(1)},...,\theta^{(n_u)})=\frac{1}{2}\sum_{(i,j):r(i,j)=1}((\theta^{(j)})^Tx^{(i)}-y^{(i,j)})^2+\frac{\lambda}{2}\sum_{i=1}^{n_m}\sum_{k=1}^n(x_k^{(i)})^2+\frac{\lambda}{2}\sum_{j=1}^{n_u}\sum_{k=1}^n(\theta_k^{(j)})^2}$$$$\min J(x^{(1)},...,x^{(n_m)},\theta^{(1)},...,\theta^{(n_u)})$$

2.2.2/2.2.4 Collaborative filtering gradient/Regularized Gradient¶

$\frac{\partial{J}}{\partial{x_k^{(i)}}}=\sum_{j:r(i,j)=1}((\theta^{(j)})^Tx^{(i)}-y^{(i,j)})\theta_k^{(j)}+\lambda x_k^{(j)}$

$x_k^{(i)}:x_k^{(i)}-\alpha\frac{\partial{J}}{\partial{x_k^{(i)}}}$

$$\boxed{x_k^{(i)}:x_k^{(i)}-\alpha[\sum_{j:r(i,j)=1}((\theta^{(j)})^Tx^{(i)}-y^{(i,j)})\theta_k^{(j)}+\lambda x_k^{(i)}]}$$

$\frac{\partial{J}}{\partial{\theta_k^{(j)}}}=\sum_{i:r(i,j)=1}((\theta^{(j)})^Tx^{(i)}-y^{(i,j)})x_k^{(i)}+\lambda \theta_k^{(j)}$

$\theta_k^{(i)}:\theta_k^{(i)}-\alpha\frac{\partial{J}}{\partial{\theta_k^{(j)}}}$

$$\boxed{\theta_k^{(i)}:\theta_k^{(i)}-\alpha[\sum_{i:r(i,j)=1}((\theta^{(j)})^Tx^{(i)}-y^{(i,j)})x_k^{(i)}+\lambda \theta_k^{(j)}]}$$

In [17]:

def costFunction(params,Y,R,numUsers,numMovies,numFeatures,lmbda):
    # Unrolling Parameters
    Y=Y[:numMovies,:numUsers]
    R=R[:numMovies,:numUsers]
    params=params.reshape(-1)
    X=params[:numMovies*numFeatures].reshape((numMovies,numFeatures))
    theta=params[numMovies*numFeatures:].reshape((numUsers,numFeatures))
    # Computing Cost
    H=np.dot(X,theta.T)
    J=(1/2)*np.power(np.multiply(R,H-Y),2).sum()
    Jreg=(lmbda/2)*(np.power(X,2).sum()+np.power(theta,2).sum())
    J=J+Jreg
    # Computing Gradient
    xGrad=np.dot(np.multiply(R,H-Y),theta)+lmbda*X
    thetaGrad=np.dot(np.multiply(R,H-Y).T,X)+lmbda*theta
    # Rolling into one variable
    params=np.row_stack((xGrad.reshape((-1,1)),thetaGrad.reshape((-1,1))))
    return (J,params)

In [18]:

def regularizedGradientDescent(initialParams,Y,R,numUsers,numMovies,numFeatures,lmbda,alpha,iterations):
    jHistory=[]
    params=initialParams
    print('Training Collaborative Filtering...')
    for i in range(iterations):
        # Computing cost
        J,grad=costFunction(params,Y,R,numUsers,numMovies,numFeatures,lmbda)
        jHistory.append(J)
        print(f'ITERATIONS : {i+1}\t\tCOST : {J:.3f}',end='\r')
        # Parameters update rule
        params=params-alpha*grad
    df=pd.DataFrame({'Iterations':range(iterations),'Cost Function':jHistory})
    sns.lineplot(data=df,x='Iterations',y='Cost Function');
    return jHistory,params

Let's check our gradient from 'costFunction' function by computing the gradient using two point slope equation. (Gradient Checking)

In [19]:

def computeNumericalGradient(J,theta):
    '''Computes the numerical gradient using two point slope equation'''
    numGrad=np.zeros(theta.shape)
    perturb=np.zeros(theta.shape)
    epsilon=1e-4
    for i in range(theta.size):
        perturb[i,:]=epsilon
        numGrad[i,:]=(J(theta+perturb)-J(theta-perturb))/(2*epsilon)
        perturb[i,:]=0
    return numGrad

def checkNNGradients(lmbda):
    # Create small problem
    X=np.random.randn(4,3)
    theta=np.random.randn(5,3)   
    Y=np.dot(X,theta.T)
    Y[np.random.rand(*Y.shape)>0.5]=0
    R=np.zeros(Y.shape)
    R[Y!=0]=1
    numUsers=Y.shape[1]
    numMovies=Y.shape[0]
    numFeatures=theta.shape[1]
    # Gradient from cost function derivative
    params=np.row_stack((X.reshape((-1,1)),theta.reshape((-1,1))))
    J,grad=costFunction(params,Y,R,numUsers,numMovies,numFeatures,lmbda)
    # Computing numerical gradient
    def cost(params):
        J,grad=costFunction(params,Y,R,numUsers,numMovies,numFeatures,lmbda)
        return J
    numGrad=computeNumericalGradient(cost,params)
    # Evaluating the norm of the differences between two solution
    diff=np.linalg.norm(numGrad-grad)/np.linalg.norm(numGrad+grad)
    return grad,numGrad,diff        

lmbda=1.5
grad,numGrad,diff=checkNNGradients(lmbda)
print(f'DIFFERENCE : {diff}')
pd.DataFrame(data={'grad (derivative)':grad.reshape(-1),'grad (numerical)':numGrad.reshape(-1)})

DIFFERENCE : 4.8446728621542694e-12

Out[19]:

	grad (derivative)	grad (numerical)
0	2.570436	2.570436
1	-0.173870	-0.173870
2	0.443513	0.443513
3	-1.151438	-1.151438
4	3.078315	3.078315
5	1.406108	1.406108
6	0.251307	0.251307
7	0.860391	0.860391
8	-2.623469	-2.623469
9	-0.681626	-0.681626
10	-1.868502	-1.868502
11	2.908619	2.908619
12	-1.269194	-1.269194
13	-0.828064	-0.828064
14	-0.727420	-0.727420
15	0.420340	0.420340
16	0.405181	0.405181
17	-1.245199	-1.245199
18	0.002056	0.002056
19	0.248965	0.248965
20	-3.372456	-3.372456
21	1.402044	1.402044
22	-1.401868	-1.401868
23	4.446794	4.446794
24	1.564790	1.564790
25	0.518527	0.518527
26	-0.877819	-0.877819

2.3 Learning Movie Recommendations¶

Now that we have finished the collaborative filtering cost function and gradient, we can now start training our learning algorithm to make movie recommendations for ourself. Later, we will also enter our own movie preferences, so that when the algorithm runs, we can get our own movie recommendations.

In [20]:

# Loading data
mat=loadmat('ex8_movies.mat')
Y=mat['Y']
R=mat['R']

numUsers=Y.shape[1]
numMovies=Y.shape[0]
numFeatures=10

print("Y : {0} X {1}".format(*Y.shape))
print("R : {0} X {1}".format(*R.shape))

Y : 1682 X 943
R : 1682 X 943

In [21]:

# Loading Movies
movies={'ID':[],'Movie':[],'Year':[]}
with open('movie_ids.txt') as f:
    for line in f.readlines():
        movies['ID'].append(int(line.split()[0]))
        movies['Movie'].append(' '.join(line.split()[1:-1]))
        movies['Year'].append(line.split()[-1][1:-1])
movies=pd.DataFrame(movies)
movies=movies.set_index('ID')
movies.sample(10)

Out[21]:

	Movie	Year
ID
1528	Nowhere	1997
141	20,000 Leagues Under the Sea	1954
27	Bad Boys	1995
1060	Adventures of Pinocchio, The	1996
972	Passion Fish	1992
588	Beauty and the Beast	1991
159	Basic Instinct	1992
1454	Angel and the Badman	1947
1664	8 Heads in a Duffel Bag	1997
239	Sneakers	1992

Y is a 1682x943 matrix, containing ratings (1-5) of 1682 movies by 943 users¶

In [22]:

pd.DataFrame(Y,movies['Movie']).head(5)

Out[22]:

	0	1	2	3	4	5	6	7	8	9	...	933	934	935	936	937	938	939	940	941	942
Movie
Toy Story	5	4	0	0	4	4	0	0	0	4	...	2	3	4	0	4	0	0	5	0	0
GoldenEye	3	0	0	0	3	0	0	0	0	0	...	4	0	0	0	0	0	0	0	0	5
Four Rooms	4	0	0	0	0	0	0	0	0	0	...	0	0	4	0	0	0	0	0	0	0
Get Shorty	3	0	0	0	0	0	5	0	0	4	...	5	0	0	0	0	0	2	0	0	0
Copycat	3	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0

5 rows × 943 columns

R is a 1682x943 matrix, where R(i,j) = 1 if and only if user j gave a rating to movie i¶

In [23]:

pd.DataFrame(R,movies['Movie']).head(5)

Out[23]:

	0	1	2	3	4	5	6	7	8	9	...	933	934	935	936	937	938	939	940	941	942
Movie
Toy Story	1	1	0	0	1	1	0	0	0	1	...	1	1	1	0	1	0	0	1	0	0
GoldenEye	1	0	0	0	1	0	0	0	0	0	...	1	0	0	0	0	0	0	0	0	1
Four Rooms	1	0	0	0	0	0	0	0	0	0	...	0	0	1	0	0	0	0	0	0	0
Get Shorty	1	0	0	0	0	0	1	0	0	1	...	1	0	0	0	0	0	1	0	0	0
Copycat	1	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0

5 rows × 943 columns

Visualizing the movie ratings¶

In [24]:

plt.imshow(Y,cmap='gray')
plt.xlabel('Users')
plt.ylabel('Movies');

In [25]:

print('Please enter the movie ratings(1-5 or 0 if not seen): \n')
userRatings=np.zeros((numMovies,1))

# Ask user ratings for set movies
indices=[1,98,7,12,55,64,66,69,183,225,355]
for index in indices:
    movie=movies.loc[index]
    try:
        question=f"{movie['Movie']} ({movie['Year']})"
        question=question.ljust(50)
        rating=float(input(question))
        if 1<=rating<=5:
            userRatings[int(index)-1]=rating
    except Exception as e:
        break

Please enter the movie ratings(1-5 or 0 if not seen): 

Toy Story (1995)                                  4.1
Silence of the Lambs, The (1991)                  4.5
Twelve Monkeys (1995)                             1.5
Usual Suspects, The (1995)                        4.7
Professional, The (1994)                          3
Shawshank Redemption, The (1994)                  5
While You Were Sleeping (1995)                    3.5
Forrest Gump (1994)                               5
Alien (1979)                                      3
101 Dalmatians (1996)                             3.8
Sphere (1998)                                     0

In [26]:

# Adding new user ratings 
Y=np.column_stack((userRatings,Y))
# Updating R with new user ratings
R=np.column_stack((np.zeros(numMovies),R))
indices=np.where(Y[:,0]!=0)[0]
R[indices,0]=1

In [27]:

# Normalize Ratings
Ymean=np.zeros((numMovies,1))
Ynorm=np.zeros(Y.shape)

for i in range(numMovies):
    idx=np.where(R[i,:]==1)
    Ymean[i]=Y[i,idx].mean()
    Ynorm[i,idx]=Y[i,idx]-Ymean[i]

In [28]:

# Setting Initial Parameters
X=np.random.randn(numMovies,numFeatures)
theta=np.random.randn(numUsers,numFeatures)
initialParams=np.row_stack((X.reshape((-1,1)),theta.reshape((-1,1))))

# Setting Gradient descent options
lmbda=10
alpha=0.003
iterations=500

# Training Collaborative filtering
jHistory,params=regularizedGradientDescent(initialParams,Ynorm,R,numUsers,numMovies,numFeatures,lmbda,alpha,iterations)
# Unrolling the learned parameters into X and theta
X=params[:numMovies*numFeatures].reshape((numMovies,numFeatures))
theta=params[numMovies*numFeatures:].reshape((numUsers,numFeatures))

Training Collaborative Filtering...
ITERATIONS : 500		COST : 38914.118

After training the model, we can now make the predictions by computing the prediction matrix.

In [29]:

P=np.dot(X,theta.T)
userPrediction=P[:,0]+Ymean.reshape(-1)

For Original Ratings provided by the user :

In [30]:

df=movies.copy()
df['User Ratings']=userRatings
df[df['User Ratings']!=0]

Out[30]:

	Movie	Year	User Ratings
ID
1	Toy Story	1995	4.1
7	Twelve Monkeys	1995	1.5
12	Usual Suspects, The	1995	4.7
55	Professional, The	1994	3.0
64	Shawshank Redemption, The	1994	5.0
66	While You Were Sleeping	1995	3.5
69	Forrest Gump	1994	5.0
98	Silence of the Lambs, The	1991	4.5
183	Alien	1979	3.0
225	101 Dalmatians	1996	3.8

Top 10 Recommended Movies are :

In [31]:

limit=10
df=movies.copy()
df['Predicted Ratings']=userPrediction
df.sort_values(by=['Predicted Ratings'],ascending=False).head(limit)

Out[31]:

	Movie	Year	Predicted Ratings
ID
1201	Marlene Dietrich: Shadow and Light	1996	5.0
1122	They Made Me a Criminal	1939	5.0
1293	Star Kid	1997	5.0
1467	Saint of Fort Washington, The	1993	5.0
1599	Someone Else's America	1995	5.0
814	Great Day in Harlem, A	1994	5.0
1653	Entertaining Angels: The Dorothy Day Story	1996	5.0
1189	Prefontaine	1997	5.0
1536	Aiqing wansui	1994	5.0
1500	Santa with Muscles	1996	5.0