#!/usr/bin/env python # coding: utf-8 # # Movie Recommendation Engine # The purpose of this notebook is to implement a recommendation engine end-to-end. # # I have briefly touched on the math and the logic behind this factinating domain of Machine Learning in the previous post [Bridging Recommender Systems and Dimensionality Reduction](http://francescopochetti.com/bridging-recommender-systems-dimensionality-reduction/), therefore I am going to save time and typing here. # # As a brief refresh, there are three main ways to approch a recommender system challenge: **content based**, **collaborative filtering** and **low rank factorization** or **latent factor models**. The third is the one we are about to discover. The exercise consists in predicting missing movie ratings (on a scale of 1 to 5) in a $m \times u$ (1682 $\times$ 943) matrix of *m* movies and *u* users ([dataset](https://grouplens.org/datasets/movielens/)). # ### The matrices # # Here some additional details about the matrices we are going to encounter below. # The matrix *Y* (a $m \times u$ matrix) stores the ratings $y^{(i,j)}$ (from 1 to 5). The matrix *R* is an binary-valued indicator matrix, where $R_{(i, j)} = 1$ if user *j* gave a rating to movie *i*, and $R_{(i, j)} = 0$ otherwise. The # objective of the model is to predict movie ratings for the movies that users have not yet rated, that is, the entries with $R_{(i, j)} = 0$. This will allow us to recommend the movies with the highest predicted ratings to the user. # Throughout the exercise, we will also be working with the matrices, *X* and $\Theta$. The *i*-th row of *X* corresponds to the feature vector $x^{(i)}$ for the *i*-th movie, and the *j*-th row of $\Theta$ corresponds to one parameter vector $\theta^{(j)}$, for the *j*-th user. Both $x^{(i)}$ and $\theta^{(j)}$ are *n*-dimensional vectors. For the purposes of # this exercise, we will use n = 100, and therefore, $x^{(i)} \in \mathcal{R}^{n}$ and $\theta^{(j)} \in \mathcal{R}^{n}$. # Correspondingly, *X* is a $m \times n$ matrix and $\Theta$ is a $u \times n$ matrix. # # ### The learning algorithm # The low rank factorization algorithm in the setting of movie recommendations considers a set of *n*-dimensional parameter vectors $x^{(1)}, \cdots , x^{(m)}$ and $\theta^{(1)}, \cdots , \theta^{(u)}$, where the model predicts the rating for movie *i* by user *j* as $y^{(i,j)} = (\theta^{(j)})^T x^{(i)}$. Given a dataset that consists of a set of ratings produced by some users on some movies, we wish to learn the parameter vectors $x^{(1)}, \cdots , x^{(m)}, \theta^{(1)}, \cdots , \theta^{(u)}$ that produce the best fit (minimizes the squared error). # ### The regularized cost function... # The cost function we want to minimize is # # $J(x^{(1)}, \cdots , x^{(m)}, \theta^{(1)}, \cdots , \theta^{(u)}) = \frac{1}{2} \sum_{(i,j):r(i,j)=1}\Big((\theta^{(j)})^T x^{(i)} - y^{(i,j)}\Big)^2 + \Big( \frac{\lambda}{2} \sum_{j=1}^u \sum_{k=1}^n (\theta_k^{(j)})^2 \Big) + \Big( \frac{\lambda}{2} \sum_{i=1}^m \sum_{k=1}^n (x_k^{(i)})^2 \Big)$ # ### ...and its gradient # whose gradient w.r.t. $\Theta$ and $X$ is respectively equal to # # $\frac{\partial J}{\partial x_k^{(i)}} = \sum_{j:r(i,j)=1}\Big((\theta^{(j)})^T x^{(i)} - y^{(i,j)}\Big)\theta_k^{(j)} + \lambda x_k^{(i)}$ # # $\frac{\partial J}{\partial \theta_k^{(j)}} = \sum_{i:r(i,j)=1}\Big((\theta^{(j)})^T x^{(i)} - y^{(i,j)}\Big)x_k^{(i)} + \lambda \theta_k^{(j)}$ # # I think we are all set. Now we can start playing around with real code. # In[184]: import scipy.io import random import numpy as np import pandas as pd import time import seaborn as sns import pickle import matplotlib.pyplot as plt plt.style.use('ggplot') get_ipython().run_line_magic('matplotlib', 'inline') # ## Functions # In[185]: def cofiCostFunction(weights, Y, R, num_users, num_movies, num_features, lamb): """ X - num_movies x num_features matrix of movie features Theta - num_users x num_features matrix of user features Y - num_movies x num_users matrix of user ratings of movies R - num_movies x num_users matrix, where R(i, j) = 1 if the i-th movie was rated by the j-th user X_grad - num_movies x num_features matrix, containing the partial derivatives w.r.t. to each element of X Theta_grad - num_users x num_features matrix, containing the partial derivatives w.r.t. to each element of Theta """ X = weights[0:num_movies * (num_features)].reshape(num_movies, num_features) #contains n features per movie Theta = weights[num_movies * (num_features):].reshape(num_users, num_features) #contains n features per user cost = 0.5 * (((R * X.dot(Theta.T)) - Y)**2).sum() + lamb/2 * (Theta**2).sum() + lamb/2 * (X**2).sum() X_grad = np.zeros((num_movies, num_features)) Theta_grad = np.zeros((num_users, num_features)) for m in range(num_movies): #get all users who have rated movie m; this is a boolean array with size (1, users_watched_movie_m) users_watched_movie_m = R[m,:] == 1 #from Theta containing n features per user, get all the features for all users_watched_movie_m #size (users_watched_movie_m, n_features) Theta_m = Theta[users_watched_movie_m,:] #from Y containing (movies, users), get all the ratings for all users_watched_movie_m #size (1, users_watched_movie_m) Y_m = Y[m, users_watched_movie_m].reshape((1, users_watched_movie_m.sum())) #X[m,:] --> (1, n_features) #X[m,:].dot(Theta_m.T) --> (1, users_watched_movie_m) #X[m,:].dot(Theta_m.T) - Y_m --> (1, users_watched_movie_m) #(X[m,:].dot(Theta_m.T) - Y_m).dot(Theta_m) --> (1, n_features) X_grad[m, :] = (X[m,:].dot(Theta_m.T) - Y_m).dot(Theta_m) + lamb * X[m,:] for u in range(num_users): #get all movies rated by user u movies_rated_by_user_u = R[:,u] == 1 #from X containing n features per movie, get all the features for all movies_rated_by_user_u #size (movies_rated_by_user_u, n_features) X_u = X[movies_rated_by_user_u,:] #from Y containing (movies, users), get all the ratings for all movies_rated_by_user_u #size (movies_rated_by_user_u, 1) Y_u = Y[movies_rated_by_user_u, u].reshape((movies_rated_by_user_u.sum(), 1)) Theta_u = Theta[u, :].reshape((1, num_features)) #X_u --> (movies_rated_by_user_u, n_features) #X_u.dot(Theta_u.T) --> (movies_rated_by_user_u, 1) #X_u.dot(Theta_u.T) - Y_u --> (movies_rated_by_user_u, 1) #(X_u.dot(Theta_u.T) - Y_u).T.dot(X_u) --> (1, n_features) Theta_grad[u, :] = (X_u.dot(Theta_u.T) - Y_u).T.dot(X_u) + lamb * Theta_u return np.asscalar(cost.squeeze()), np.vstack((X_grad.reshape(X_grad.size, 1), Theta_grad.reshape(Theta_grad.size, 1))).flatten() def loadMovieList(): d = {} with open('C:\\Users\\pochetti\Machine_Learning\\mlclass-ex8-005\\mlclass-ex8\movie_ids.txt') as f: for line in f: (key, val) = line.split(' ', 1) d[int(key)] = val.rstrip() return d def normalizeRatings(Y, R): """returns normalized Y (and its mean) so that each movie has a rating of 0 on average""" movie_av = np.zeros((Y.shape[0], 1)) for m in range(Y.shape[0]): to_be_averaged = R[m,:] == 1 movie_av[m] = Y[m, to_be_averaged].mean() return Y - movie_av, movie_av # ### Running the model on pre-trained weights # This is just to check that the cost function works as expected. The printed value is a benchmark calculated on smaller matrices and offered during Andrew Ng's course, and my result perfectly matches his. So, good news, we can go ahead. # In[186]: data = scipy.io.loadmat('C:\\Users\\pochetti\Machine_Learning\\mlclass-ex8-005\\mlclass-ex8\ex8_movies.mat') R = data['R'] Y = data['Y'] params = scipy.io.loadmat('C:\\Users\\pochetti\Machine_Learning\\mlclass-ex8-005\\mlclass-ex8\ex8_movieParams.mat') Theta = params['Theta'] X = params['X'] # Reduce the data set size just for checking purposes num_users = 4 num_movies = 5 num_features = 3 X = X[0:num_movies, 0:num_features] Theta = Theta[0:num_users, 0:num_features] Y = Y[0:num_movies, 0:num_users] R = R[0:num_movies, 0:num_users] weights = np.vstack((X.reshape(X.size, 1), Theta.reshape(Theta.size, 1))) # Evaluate cost function cost, gradient = cofiCostFunction(weights, Y, R, num_users, num_movies, num_features, 1.5) print('Regularized Cost @lambda 10:', cost) # ## Implementing the Low Rank Factorization Algorithm # Loading the whole dataset (*Y* and *R*) and "visualizing" the $movies \times users$ (1682 $\times$ 943) *Y* matrix. *imshow* nicely allows us to highlight the 1 to 5 ratings and the missing entries. As you can see the matrix is (as expected) pretty sparse. # In[187]: data = scipy.io.loadmat('C:\\Users\\pochetti\Machine_Learning\\mlclass-ex8-005\\mlclass-ex8\ex8_movies.mat') R = data['R'] Y = data['Y'] plt.figure(figsize=(10, 8)) plt.imshow(Y, extent=[0, 5, 0, 5], cmap='Blues') plt.colorbar() plt.title('Movies x Users matrix') plt.show() # In[188]: # loading the movie list with ratings. # this is in order to input my personal reviews movieList = loadMovieList() my_ratings = np.zeros((Y.shape[0], 1)) my_ratings[0] = 3 my_ratings[1] = 4 my_ratings[7] = 1 my_ratings[10]= 5 my_ratings[21]= 5 my_ratings[26]= 4 my_ratings[27]= 5 my_ratings[34]= 1 my_ratings[55]= 5 my_ratings[1234]= 3 my_ratings[1167]= 2 my_ratings[1088]= 3 my_ratings[1001]= 2 my_ratings[819]= 1 my_ratings[688]= 5 my_ratings[627]= 4 my_ratings[526]= 3 my_ratings[456]= 1 print("Let's rate some movies I have watched...\nThis is important to get future recommendations!\n") for movie in range(my_ratings.shape[0]): if my_ratings[movie] > 0: print('Movie: "%s". My rating: %d' % (movieList[movie+1], my_ratings[movie])) # In[189]: # Y is a 1682x943 matrix, containing ratings (1-5) of 1682 movies on # 943 users # # R is a 1682x943 matrix, where R(i,j) = 1 if and only if user j gave a # rating to movie i data = scipy.io.loadmat('C:\\Users\\pochetti\Machine_Learning\\mlclass-ex8-005\\mlclass-ex8\ex8_movies.mat') R = data['R'] Y = data['Y'] # Add our own ratings to the data matrix Y = np.hstack((Y, my_ratings)) R = np.hstack((R, np.where(my_ratings != 0, 1, 0))) # Normalize Ratings Ynorm, Ymean = normalizeRatings(Y, R) # Useful Values num_users = Y.shape[1] num_movies = Y.shape[0] num_features = 100 #################################################### # Guessing Initial Parameters (Theta, X) #################################################### # This is done by performing an SVD on Y, # then reconstructing Theta by multiplying s and V # and then truncating X and Theta's columns to account for the right number of features. # It is generally a good idea to initialize the parameters with SVD # as this decomposition guarantees by definition the lowest possible # reconstruction erro of Y, so it helps us going in the right direction. # At the end of the day we have to produce X and Theta so that the RMSE # between Y and Theta x X over the non null values of Y is as small as possible. # The problem with SVD is that it assumes all the missing values are 0, hence the # decomposition will yield ineherently wrong results. This is why we need # an additional optimization on top of it. X, s, V = np.linalg.svd(Ynorm, full_matrices=False) features = s.shape[0] S = np.zeros((features, features)) S[:features, :features] = np.diag(s) Theta = np.dot(S, V)[:,:num_features] X = X[:,:num_features] initial_weights = np.vstack((X.reshape(X.size, 1), Theta.reshape(Theta.size, 1))) print('Y dimensions:', Y.shape) print('R dimensions:', R.shape) print('X dimensions:', X.shape) print('Theta dimensions:', Theta.shape) # In[190]: # Set Regularization lamb = 10 start = time.time() opt_results = scipy.optimize.minimize(fun=cofiCostFunction, x0=initial_weights.squeeze(), args=(Ynorm, R, num_users, num_movies, num_features, lamb), jac=True, method='TNC', options={'disp':True}) end = time.time() opt_results # In[191]: total = end-start print("Training executed in", round(total/60, 2), "minutes") # ## Recommended for me! # After training the model, we can now make recommendations by computing the predictions matrix and then display the top movies I really cannot miss! # In[192]: # Unfold the returned weights into X and Theta X = opt_results.x[0:num_movies * (num_features)].reshape(num_movies, num_features) Theta = opt_results.x[num_movies * (num_features):].reshape(num_users, num_features) predicted = X.dot(Theta.T) mine = predicted[:,num_users - 1] + Ymean.flatten() top = np.argsort(mine)[::-1][:20] # ### Top 20 movies I never watched and I cannot miss! # In[193]: print('Top Recommended Movies\n') for rank, index in enumerate(top): if index not in [movie for (movie, rating) in enumerate(my_ratings) if rating > 0]: print('%d: "%s". Predicted rating: %2.2f' % (rank+1, movieList[index+1], mine[index]))