Collaborative filtering with side information¶

This IPython notebook illustrates the usage of the cmfrec Python package for collective matrix factorization using the MovieLens-1M data, consisting of ratings from users about movies + user demographic information, plus the movie tag genome.

Collective matrix factorization is a technique for collaborative filtering with additional information about the users and items, based on low-rank joint factorization of different matrices with shared factors – for more details see the paper Singh, A. P., & Gordon, G. J. (2008, August). Relational learning via collective matrix factorization. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 650-658). ACM..

Small note: if the TOC here is not clickable or the math symbols don't show properly, try visualizing this same notebook from nbviewer following this link.

Sections¶

1. Model description

3. Basic model - only movie ratings

4. Model with user side information

5. Model with item side information

6. Full model

7. Examining some recomendations

1. Model description¶

The colective matrix mactorization model is an extension of the typical low-rank matrix factorization model to incorporate user and/or item side information. In its most basic form, low-rank matrix factorization tries to find an approximation of a matrix $X$ given by two lower-rank matrices $A$ and $B$, which in recommender systems would represent, respectively, a matrix of latent factors for users and items, which are determined by minimizing the squared differences between their product and $X$, i.e.:

$$argmin_{A, B} \lVert X - AB^T \lVert$$

This basic formula can be improved by adding regularization on the $A$ and $B$ matrices, as well as by centering the matrix $X$ by substracting its global mean for each entry, adding user and item biases, and considering only the non-missing entries, i.e.:

$$argmin_{A, B, U_b, I_b} \lVert (X - \mu - U_b - I_b - AB^T)I_{x} \lVert^2 + \lambda (\lVert A\lVert^2 + \lVert B \lVert^2 + \lVert U_b \lVert^2 + \lVert I_b \lVert^2)$$

Where:

• $X$ is the ratings matrix (entry in row $\{ i,j\}$ contains the rating given by user $i$ to item $j$).
• $A$ and $B$ are lower-dimensional matrices (model parameters).
• $U_b$ is a column matrix of user biases, containing at each row a constant for its respective user.
• $I_b$ is a row matrix of item biases, containing at each column a constant for its respective item.
• $mu$ is the mean of the entries in $X$.
• $I_x$ is an indicator matrix with entries in row $\{i,j\}$ equal to one when that same entry is present in the matrix $X$, and equal to zero when the corresponding entry is missing in $X$.
• $\lambda$ is a regularization parameter.

In collective matrix factorization, this model is further extended by also factorizing matrices of user and/or item side information (e.g. movie tags, user demographic info in tabular format, etc.)., e.g.:

$$argmin_{A, B, C, D, U_b, I_b} \lVert (X - \mu - U_b - I_b - AB^T)I_{x} \lVert^2 + \lVert U - AC^T \lVert^2 + \lVert I - BD^T \lVert^2 + \lambda (\lVert A\lVert^2 + \lVert B \lVert^2 + \lVert C \lVert^2 + \lVert D \lVert^2 + \lVert U_b \lVert^2 + \lVert I_b \lVert^2)$$

Where, in addition to the previous model:

• $U$ is the user side information matrix.
• $I$ is the item side information matrix.
• $C$ is a matrix of latent factors for user attributes (model parameters).
• $D$ is a matrix of latent factors for item attributes (model parameters).

(Other variations such as different weights for each factorization, different regularization for each paramter, and most notably, applying a sigmoid function on factorized values for binary variables, among others, are also possible to fit with this package).

Intuitively, latent factors that also do a good job at explainin user/item attributes should generalize better to ratings data than latent factors that don't, even though it might adversely affect training error in the factorization of interest (the $X$ matrix).

Alternatively, the package can also use a different formulation, in which the user and/or item attributes can be though of as the base of the factorization, with additional latent matrices acting as offsets for each user and item (deviations from its expected ratings according to the side information), e.g.:

$$argmin_{A, B, C, D, U_b, I_b} \lVert (X - \mu - U_b - I_b - (UC + A)(ID + B)\:)I_{x} \lVert^2 + \lambda (\lVert A\lVert^2 + \lVert B \lVert^2 + \lVert C \lVert^2 + \lVert D \lVert^2 + \lVert U_b \lVert^2 + \lVert I_b \lVert^2)$$

Both of these models allow for making recommendations based only on user/item side information without ratings, in the first case by either training the model with extra users/items, or by computing the corresponding rows/columns of $A$ and $B$ by minimizing only the factorization of $U$ and $I$ (for a new user/item, there won't be any new entries in $C$ or $D$), which can be done in closed form; and in the second case, by setting the corresponding rows/columns of $A$ and $B$ to zero. As the ratings are centered, the expected value of both $U_b$ and $I_b$ are zero, which aids in cold-start recommendations.

This example notebook uses the MovieLens 1M dataset, with movie tags taken from the latest MovieLens release, and user demographic information linked to the user information provided in the dataset, by taking a publicly available table mapping zip codes to states, another one mapping state names to their abbreviations, and finally classifying the states into regions according to usual definitions.

Unfortunately, later (bigger) release of the MovieLens dataset no longer include include user demographic information.

2.1 Ratings data¶

The ratings come in the form of a table with columns UserId, ItemId, Rating, and Timestamp:

In [1]:
import numpy as np, pandas as pd, time, re
from datetime import datetime
from cmfrec import CMF

engine='python', names=['UserId','ItemId','Rating','Timestamp'])
del ratings['Timestamp']

Out[1]:
UserId ItemId Rating
0 1 1193 5
1 1 661 3
2 1 914 3
3 1 3408 4
4 1 2355 5

2.2 Creating a train and test split¶

Usually, a good way to test recommender models is by temporal splits (splitting the data at some temporal cutoff point between train and test), but in this case, it's more desirable to make a distinction between warm start (predicting ratings strictly for users and items that were in the training data), and different forms of cold-start, i.e.: completely new users and items, new users with known items, and vice versa, which I'll try do here:

In [2]:
np.random.seed(1)
user_ids = ratings.UserId.drop_duplicates().values
item_ids = ratings.ItemId.drop_duplicates().values
users_train = set(np.random.choice(user_ids, size=int(user_ids.shape[0] * .75), replace=False))
items_train = set(np.random.choice(item_ids, size=int(item_ids.shape[0] * .75), replace=False))
train = ratings.loc[ratings.UserId.isin(users_train) & ratings.ItemId.isin(items_train)].reset_index(drop=True)

np.random.seed(1)
train_ix = train.sample(frac=.85).index
test_ix = np.setdiff1d(train.index.values, train_ix)
test_warm_start = train.loc[test_ix].reset_index(drop=True)
train = train.loc[train_ix].reset_index(drop=True)
users_train = set(train.UserId)
items_train = set(train.ItemId)
test_warm_start = test_warm_start.loc[test_warm_start.UserId.isin(users_train) &
test_warm_start.ItemId.isin(items_train)].reset_index(drop=True)

test_cold_start = ratings.loc[~ratings.UserId.isin(users_train) & ~ratings.ItemId.isin(items_train)].reset_index(drop=True)
test_new_users = ratings.loc[(~ratings.UserId.isin(users_train)) & (ratings.ItemId.isin(items_train))].reset_index(drop=True)
test_new_items = ratings.loc[(ratings.UserId.isin(users_train)) & (~ratings.ItemId.isin(items_train))].reset_index(drop=True)
test_new_items_set = set(test_new_items.ItemId)
users_coldstart = set(test_cold_start.UserId)
items_coldstart = set(test_cold_start.ItemId)

print(train.shape)
print(test_warm_start.shape)

(478105, 3)
(84358, 3)

In [3]:
print(len(users_train))
print(len(items_train))

4530
2753


2.3 Processing item tags¶

Item tags were taken from the latest MovieLens release, and joined to the dataset by movie title, which is not a perfect match but does a reasonable job. They are unfortunately not available for all the movies for which there are ratings.

For the second model, as the dimensionality of the tags is quite high, I'll also take a smaller transformation consisting of the first 50 principal components of these tags.

In [4]:
movie_titles = pd.read_table('~/movielens/ml-1m/movies.dat',
movie_titles.columns = ['ItemId', 'title', 'genres']
movie_titles = movie_titles[['ItemId', 'title']]

# will save the movie titles for later
movie_id_to_title = {i.ItemId:i.title for i in movie_titles.itertuples()}

movies = movies[['movieId', 'title']]
movies = pd.merge(movies, movie_titles)
movies = movies[['movieId', 'ItemId']]

tags_wide = tags.pivot(index='movieId', columns='tagId', values='relevance')
tags_wide.columns=["tag"+str(i) for i in tags_wide.columns.values]

item_side_info = pd.merge(movies, tags_wide, how='inner', left_on='movieId', right_index=True)
del item_side_info['movieId']
items_w_sideinfo = set(item_side_info.ItemId)
test_new_items = test_new_items.loc[test_new_items.ItemId.isin(items_w_sideinfo)].reset_index(drop=True)
item_sideinfo_train = item_side_info.loc[item_side_info.ItemId.isin(items_train)].reset_index(drop=True)
item_sideinfo_testnew = item_side_info.loc[item_side_info.ItemId.isin(test_new_items_set)].reset_index(drop=True)
test_cold_start = test_cold_start.loc[test_cold_start.ItemId.isin(items_w_sideinfo)].reset_index(drop=True)

Out[4]:
ItemId tag1 tag2 tag3 tag4 tag5 tag6 tag7 tag8 tag9 ... tag1119 tag1120 tag1121 tag1122 tag1123 tag1124 tag1125 tag1126 tag1127 tag1128
0 1 0.02475 0.02475 0.04900 0.07750 0.1245 0.23875 0.06575 0.28575 0.25400 ... 0.03125 0.02050 0.04300 0.03375 0.12375 0.04150 0.02125 0.03600 0.10425 0.02750
1 2 0.03750 0.04100 0.03675 0.04750 0.1000 0.05950 0.05125 0.09600 0.08875 ... 0.03425 0.01825 0.01650 0.02325 0.13525 0.02450 0.01825 0.01325 0.08550 0.01925
2 8 0.02750 0.03250 0.04250 0.02275 0.0545 0.03050 0.01700 0.06500 0.02625 ... 0.03075 0.02025 0.01525 0.02075 0.21150 0.02450 0.01925 0.00975 0.08125 0.01675
3 9 0.03175 0.03600 0.01750 0.01650 0.0330 0.01500 0.01350 0.03950 0.01375 ... 0.01825 0.01125 0.01125 0.01475 0.15250 0.02175 0.01175 0.00650 0.08350 0.01725
4 10 0.99975 0.99975 0.01900 0.03400 0.0605 0.04100 0.04575 0.12000 0.06550 ... 0.49425 0.02250 0.02100 0.02950 0.16275 0.04600 0.02075 0.01575 0.07250 0.01875

5 rows × 1129 columns

In [5]:
from sklearn.decomposition import PCA

pca_obj = PCA(n_components = 50)
item_sideinfo_reduced = item_side_info.copy()
del item_sideinfo_reduced['ItemId']
pca_obj.fit(item_sideinfo_reduced)
item_sideinfo_pca = pca_obj.transform(item_sideinfo_reduced)
item_sideinfo_pca = pd.DataFrame(item_sideinfo_pca)
item_sideinfo_pca.columns = ["pc"+str(i) for i in range(item_sideinfo_pca.shape[1])]
item_sideinfo_pca['ItemId'] = item_side_info.ItemId.values.copy()

item_sideinfo_pca_train = item_sideinfo_pca.loc[item_sideinfo_pca.ItemId.isin(items_train)].reset_index(drop=True)
item_sideinfo_pca_testnew = item_sideinfo_pca.loc[item_sideinfo_pca.ItemId.isin(test_new_items_set)].reset_index(drop=True)
item_sideinfo_pca_coldstart = item_sideinfo_pca.loc[item_sideinfo_pca.ItemId.isin(items_coldstart)].reset_index(drop=True)

Out[5]:
pc0 pc1 pc2 pc3 pc4 pc5 pc6 pc7 pc8 pc9 ... pc41 pc42 pc43 pc44 pc45 pc46 pc47 pc48 pc49 ItemId
0 1.174361 2.454127 2.032595 -1.171341 0.298269 1.355678 -0.692807 -1.044511 2.065727 -0.531681 ... 0.055836 -0.167791 -0.311193 0.319949 0.133470 0.029714 -0.302162 0.147117 -0.369435 1
1 -1.340086 1.930136 1.029146 -0.354897 -0.436367 0.379800 -0.554990 -0.605132 1.111225 -0.534643 ... 0.095954 0.178682 -0.467821 -0.037618 -0.283306 0.275545 -0.072865 0.452367 0.014473 2
2 -1.536161 -0.006325 0.944357 0.387365 -0.196264 -0.093866 -0.670976 -0.110382 0.694794 -0.322744 ... -0.116378 0.228412 -0.248827 -0.186690 -0.427460 0.028590 -0.077187 0.132514 -0.102053 8
3 -2.044958 0.482059 -0.208248 0.825030 0.029346 -0.384561 0.184310 0.901687 0.433354 0.701625 ... -0.060544 -0.021556 -0.083973 -0.121497 0.065433 -0.317927 0.307823 -0.281140 -0.166976 9
4 -0.786717 1.928858 0.032624 0.888157 0.171164 -0.035373 0.596246 0.824269 0.295776 0.813716 ... -0.177851 -0.259741 -0.095441 -0.276813 -0.233801 0.206002 0.016063 0.309817 0.092431 10

5 rows × 51 columns

In [6]:
print(test_new_items.shape[0])
print(test_new_items.ItemId.drop_duplicates().shape[0])

171405
759


2.4 Processing user demographic info¶

The extra data is exaplained at the beginning. Joining all the data:

In [7]:
zipcode_abbs = pd.read_csv("~/movielens/states.csv")
zipcode_abbs_dct = {z.State:z.Abbreviation for z in zipcode_abbs.itertuples()}
us_regs_table = [
('New England', 'Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, Vermont'),
('Middle Atlantic', 'Delaware, Maryland, New Jersey, New York, Pennsylvania'),
('South', 'Alabama, Arkansas, Florida, Georgia, Kentucky, Louisiana, Mississippi, Missouri, North Carolina, South Carolina, Tennessee, Virginia, West Virginia'),
('Midwest', 'Illinois, Indiana, Iowa, Kansas, Michigan, Minnesota, Nebraska, North Dakota, Ohio, South Dakota, Wisconsin'),
('Southwest', 'Arizona, New Mexico, Oklahoma, Texas'),
]
us_regs_table = [(x[0], [i.strip() for i in x[1].split(",")]) for x in us_regs_table]
us_regs_dct = dict()
for r in us_regs_table:
for s in r[1]:
us_regs_dct[zipcode_abbs_dct[s]] = r[0]

zipcode_info = zipcode_info.groupby('Zipcode').first().reset_index()
zipcode_info['State'].loc[zipcode_info.Country != "US"] = 'UnknownOrNonUS'
zipcode_info['Region'] = zipcode_info['State'].copy()
zipcode_info['Region'].loc[zipcode_info.Country == "US"] = \
zipcode_info.Region\
.loc[zipcode_info.Country == "US"]\
.map(lambda x: us_regs_dct[x] if x in us_regs_dct else 'UsOther')
zipcode_info = zipcode_info[['Zipcode', 'Region']]

sep='::', names=["UserId", "Gender", "Age", "Occupation", "Zipcode"], engine='python')
users["Zipcode"] = users.Zipcode.map(lambda x: np.int(re.sub("-.*","",x)))
users = pd.merge(users,zipcode_info,on='Zipcode',how='left')
users['Region'] = users.Region.fillna('UnknownOrNonUS')

users['Occupation'] = users.Occupation.map(lambda x: str(x))
users['Age'] = users.Age.map(lambda x: str(x))
user_side_info = pd.get_dummies(users[['UserId', 'Gender', 'Age', 'Occupation', 'Region']])
users_w_sideinfo = set(user_side_info.UserId)
test_new_users = test_new_users.loc[test_new_users.ItemId.isin(users_w_sideinfo)].reset_index(drop=True)
user_sideinfo_train = user_side_info.loc[user_side_info.UserId.isin(users_train)].reset_index(drop=True)
test_cold_start = test_cold_start.loc[test_cold_start.UserId.isin(users_w_sideinfo)].reset_index(drop=True)

/home/david_cortes_rivera/anaconda3/envs/py3d/lib/python3.6/site-packages/IPython/core/interactiveshell.py:2785: DtypeWarning: Columns (11) have mixed types. Specify dtype option on import or set low_memory=False.
interactivity=interactivity, compiler=compiler, result=result)

Out[7]:
UserId Gender_F Gender_M Age_1 Age_18 Age_25 Age_35 Age_45 Age_50 Age_56 ... Occupation_8 Occupation_9 Region_Middle Atlantic Region_Midwest Region_New England Region_South Region_Southwest Region_UnknownOrNonUS Region_UsOther Region_West
0 2 0 1 0 0 0 0 0 0 1 ... 0 0 0 0 0 1 0 0 0 0
1 4 0 1 0 0 0 0 1 0 0 ... 0 0 0 0 1 0 0 0 0 0
2 5 0 1 0 0 1 0 0 0 0 ... 0 0 0 1 0 0 0 0 0 0
3 6 1 0 0 0 0 0 0 1 0 ... 0 1 0 1 0 0 0 0 0 0
4 7 0 1 0 0 0 1 0 0 0 ... 0 0 0 0 1 0 0 0 0 0

5 rows × 39 columns

In [8]:
print(test_new_users.shape[0])
print(test_new_users.UserId.drop_duplicates().shape[0])

187456
1510

In [9]:
print(test_cold_start.shape[0])
print(test_cold_start.UserId.unique().shape[0])
print(test_cold_start.ItemId.unique().shape[0])

57290
1509
737


3. Basic model - only movie ratings¶

Non-collective factorization model - including user and item biases + regularization:

3.1 Fitting the model¶

In [10]:
%%time
from copy import deepcopy
from cmfrec import CMF

model_no_side_info = CMF(k=40, reg_param=1e-4, random_seed=1)
model_no_side_info.fit(deepcopy(train))
test_warm_start['Predicted'] = model_no_side_info.predict(test_warm_start.UserId, test_warm_start.ItemId)

INFO:tensorflow:Optimization terminated with:
Objective function value: 0.797731
Number of iterations: 212
Number of functions evaluations: 224
CPU times: user 6min 24s, sys: 43.6 s, total: 7min 8s
Wall time: 1min 6s


3.2 Evaluating results¶

For this model and the ones that will follow, I will evaluate the recommendations by computing:

• Root mean squared error (RMSE), i.e. sum( sqrt( (real - predicted)^2 ) ) - which can be though of the average star-rating error for each predicted rating. This is the most typical measure but has some drawbacks as it doesn't tend to be a good measure when ranking and can be substantially improved without changing the relative order of predictions.
• Taking the average rating of the top-10 recommended movies for each user.

There are other more appropriate evaluation criteria, but these are easy to understand and provide reasonable insights on model performance.

In [11]:
print("RMSE (no side info, warm start): ", np.sqrt(np.mean( (test_warm_start.Predicted - test_warm_start.Rating)**2) ))

RMSE (no side info, warm start):  0.8683360112941739

In [12]:
avg_ratings = train.groupby('ItemId')['Rating'].mean().to_frame().rename(columns={"Rating" : "AvgRating"})
test_ = pd.merge(test_warm_start, avg_ratings, left_on='ItemId', right_index=True, how='left')

print('Averge movie rating:', test_.groupby('UserId')['Rating'].mean().mean())
print('Average rating for top-10 rated by each user:', test_.sort_values(['UserId','Rating'], ascending=False).groupby('UserId')['Rating'].head(10).mean())
print('Average rating for bottom-10 rated by each user:', test_.sort_values(['UserId','Rating'], ascending=True).groupby('UserId')['Rating'].head(10).mean())
print('Average rating for top-10 recommendations of best-rated movies:', test_.sort_values(['UserId','AvgRating'], ascending=False).groupby('UserId')['Rating'].head(10).mean())
print('----------------------')
print('Average rating for top-10 recommendations from this model:', test_.sort_values(['UserId','Predicted'], ascending=False).groupby('UserId')['Rating'].head(10).mean())
print('Average rating for bottom-10 (non-)recommendations from this model:', test_.sort_values(['UserId','Predicted'], ascending=True).groupby('UserId')['Rating'].head(10).mean())

Averge movie rating: 3.695626272928932
Average rating for top-10 rated by each user: 4.236997084548105
Average rating for bottom-10 rated by each user: 3.0766763848396503
Average rating for top-10 recommendations of best-rated movies: 3.9557725947521867
----------------------
Average rating for top-10 recommendations from this model: 4.001166180758018
Average rating for bottom-10 (non-)recommendations from this model: 3.333177842565598


4. Model with user side information¶

Now I'll add only the user information (without the movie tags). These are exclusively binary columns (only 0/1 values), so I'll apply a sigmoid function on the factorized values to be between zero and one.

4.1 Original version¶

In [13]:
%%time
model_user_info = CMF(k=40, w_main=10.0, w_user=1.0, reg_param=1e-3, random_seed=1)
model_user_info.fit(deepcopy(train),
user_info=deepcopy(user_sideinfo_train),
cols_bin_user=[cl for cl in user_side_info.columns if cl!='UserId'])
test_warm_start['Predicted'] = model_user_info.predict(test_warm_start.UserId, test_warm_start.ItemId)

INFO:tensorflow:Optimization terminated with:
Message: b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'
Objective function value: 8.225498
Number of iterations: 208
Number of functions evaluations: 229
CPU times: user 7min 5s, sys: 44.2 s, total: 7min 49s
Wall time: 1min 13s


In theory, new users can be incorporated into the model without refitting it entirely from scratch (a very slow procedure on larger datasets), but this is quite slow to do with many users. The following code would do it, but for time reasons it was not executed here:

In [14]:
# %%time
# for u in list(test_new_users.UserId.unique()):
#     user_vec = user_side_info.loc[user_side_info.UserId == u]
#     del user_vec['UserId']
#     user_vec = user_vec.values.reshape((1, -1))
#     model_user_info.add_user(new_id = u, attributes = user_vec)
# test_new_users['Predicted'] = model_user_info.predict(test_new_users.UserId, test_new_users.ItemId)


Side information from users which have no ratings can still be incorporated, and predictions can also be made for these users despite not having any ratings:

In [15]:
%%time
model_user_info_all = CMF(k=40, w_main=10.0, w_user=1.0, reg_param=1e-3, random_seed=1)
model_user_info_all.fit(deepcopy(train),
user_info=deepcopy(user_side_info),
cols_bin_user=[cl for cl in user_side_info.columns if cl!='UserId'])
test_warm_start['PredictedAll'] = model_user_info_all.predict(test_warm_start.UserId, test_warm_start.ItemId)
test_new_users['PredictedAll'] = model_user_info_all.predict(test_new_users.UserId, test_new_users.ItemId)

INFO:tensorflow:Optimization terminated with:
Message: b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'
Objective function value: 8.226118
Number of iterations: 238
Number of functions evaluations: 283
CPU times: user 9min 22s, sys: 55.1 s, total: 10min 17s
Wall time: 1min 39s


The evaluation metrics are the same as before plus simple correlation coefficient:

In [16]:
print("RMSE (user side info, warm start): ", np.sqrt(np.mean( (test_warm_start.Predicted - test_warm_start.Rating)**2) ))
print("RMSE (user side info, warm start, extra users): ", np.sqrt(np.mean( (test_warm_start.PredictedAll - test_warm_start.Rating)**2) ))
# print("RMSE (user side info, new users, added afterwards): ", np.sqrt(np.mean( (test_new_users.Predicted - test_new_users.Rating)**2) ))
print("RMSE (user side info, users trained without ratings): ", np.sqrt(np.mean( (test_new_users.PredictedAll - test_new_users.Rating)**2) ))

RMSE (user side info, warm start):  0.8682231678918979
RMSE (user side info, warm start, extra users):  0.8680965530731076
RMSE (user side info, users trained without ratings):  0.969946692788752

In [17]:
print("Rho (user side info, warm start): ", np.corrcoef(test_warm_start.Predicted, test_warm_start.Rating)[0][1])
print("Rho (user side info, warm start, extra users): ", np.corrcoef(test_warm_start.PredictedAll, test_warm_start.Rating)[0][1])
# print("RMSE (user side info, new users, added afterwards): ", np.corrcoef(test_new_users.Predicted, test_new_users.Rating)[0][1])
print("Rho (user side info, users trained without ratings): ", np.corrcoef(test_new_users.PredictedAll, test_new_users.Rating)[0][1])

Rho (user side info, warm start):  0.6417988677672211
Rho (user side info, warm start, extra users):  0.6419802709408267
Rho (user side info, users trained without ratings):  0.47893410436098627

In [18]:
avg_ratings = train.groupby('ItemId')['Rating'].mean().to_frame().rename(columns={"Rating" : "AvgRating"})
test_ = pd.merge(test_warm_start, avg_ratings, left_on='ItemId', right_index=True, how='left')

print('Averge movie rating:', test_.groupby('UserId')['Rating'].mean().mean())
print('Average rating for top-10 rated by each user:', test_.sort_values(['UserId','Rating'], ascending=False).groupby('UserId')['Rating'].head(10).mean())
print('Average rating for bottom-10 rated by each user:', test_.sort_values(['UserId','Rating'], ascending=True).groupby('UserId')['Rating'].head(10).mean())
print('Average rating for top-10 recommendations of best-rated movies:', test_.sort_values(['UserId','AvgRating'], ascending=False).groupby('UserId')['Rating'].head(10).mean())
print('----------------------')
print('Average rating for top-10 recommendations from this model:', test_.sort_values(['UserId','Predicted'], ascending=False).groupby('UserId')['Rating'].head(10).mean())
print('Average rating for bottom-10 (non-)recommendations from this model:', test_.sort_values(['UserId','Predicted'], ascending=True).groupby('UserId')['Rating'].head(10).mean())

Averge movie rating: 3.695626272928932
Average rating for top-10 rated by each user: 4.236997084548105
Average rating for bottom-10 rated by each user: 3.0766763848396503
Average rating for top-10 recommendations of best-rated movies: 3.9557725947521867
----------------------
Average rating for top-10 recommendations from this model: 4.000495626822158
Average rating for bottom-10 (non-)recommendations from this model: 3.3337026239067056

In [19]:
print('Average rating for top-10 recommendations (per user) from this model per configuration')
print('warm start, extra users: ', test_warm_start.sort_values(['UserId','PredictedAll'], ascending=False).groupby('UserId')['Rating'].head(10).mean())
print('users trained without ratings: ', test_new_users.sort_values(['UserId','PredictedAll'], ascending=False).groupby('UserId')['Rating'].head(10).mean())

Average rating for top-10 recommendations (per user) from this model per configuration
warm start:  4.000495626822158
warm start, extra users:  4.002244897959184
users trained without ratings:  4.317880794701987


This shows a slight improvement over not using user demographic information, and as shown, it gets a slight advantage by incorporating side information from more users than there are ratings from. It seems to perform surprisingly well on users trained without ratings too.

4.2 Offsets model¶

Alternative formulation as explained in section 1:

In [20]:
%%time
model_user_info2 = CMF(k=40, reg_param=1e-4, offsets_model=True, random_seed=1)
model_user_info2.fit(deepcopy(train),
user_info = deepcopy(user_sideinfo_train))
test_warm_start['Predicted'] = model_user_info2.predict(test_warm_start.UserId, test_warm_start.ItemId)

INFO:tensorflow:Optimization terminated with:
Message: b'STOP: TOTAL NO. of ITERATIONS REACHED LIMIT'
Objective function value: 0.707400
Number of iterations: 1000
Number of functions evaluations: 1047
CPU times: user 37min 30s, sys: 4min 15s, total: 41min 45s
Wall time: 6min 54s

In [21]:
%%time
for u in list(test_new_users.UserId.unique()):
user_vec = deepcopy(user_side_info.loc[user_side_info.UserId == u])
del user_vec['UserId']
user_vec = user_vec.values.reshape((1, -1))
model_user_info2.add_user(new_id = u, attributes = user_vec)
test_new_users['Predicted'] = model_user_info2.predict(test_new_users.UserId, test_new_users.ItemId)

CPU times: user 2.44 s, sys: 36 ms, total: 2.48 s
Wall time: 1.89 s

In [22]:
print("RMSE (user side info, warm start): ", np.sqrt(np.mean( (test_warm_start.Predicted - test_warm_start.Rating)**2) ))
print("RMSE (user side info, new users, added afterwards): ", np.sqrt(np.mean( (test_new_users.Predicted - test_new_users.Rating)**2) ))

RMSE (user side info, warm start):  0.8987353773892749
RMSE (user side info, new users, added afterwards):  0.9800443420437788

In [23]:
print("Rho (user side info, warm start): ", np.corrcoef(test_warm_start.Predicted, test_warm_start.Rating)[0][1])
print("Rho (user side info, new users, added afterwards): ", np.corrcoef(test_new_users.Predicted, test_new_users.Rating)[0][1])

Rho (user side info, warm start):  0.6054951333024773
Rho (user side info, new users, added afterwards):  0.4611727674305657

In [24]:
avg_ratings = train.groupby('ItemId')['Rating'].mean().to_frame().rename(columns={"Rating" : "AvgRating"})
test_ = pd.merge(test_warm_start, avg_ratings, left_on='ItemId', right_index=True, how='left')

print('Averge movie rating:', test_.groupby('UserId')['Rating'].mean().mean())
print('Average rating for top-10 rated by each user:', test_.sort_values(['UserId','Rating'], ascending=False).groupby('UserId')['Rating'].head(10).mean())
print('Average rating for bottom-10 rated by each user:', test_.sort_values(['UserId','Rating'], ascending=True).groupby('UserId')['Rating'].head(10).mean())
print('Average rating for top-10 recommendations of best-rated movies:', test_.sort_values(['UserId','AvgRating'], ascending=False).groupby('UserId')['Rating'].head(10).mean())
print('----------------------')
print('Average rating for top-10 recommendations from this model:', test_.sort_values(['UserId','Predicted'], ascending=False).groupby('UserId')['Rating'].head(10).mean())
print('Average rating for bottom-10 (non-)recommendations from this model:', test_.sort_values(['UserId','Predicted'], ascending=True).groupby('UserId')['Rating'].head(10).mean())

Averge movie rating: 3.695626272928932
Average rating for top-10 rated by each user: 4.236997084548105
Average rating for bottom-10 rated by each user: 3.0766763848396503
Average rating for top-10 recommendations of best-rated movies: 3.9557725947521867
----------------------
Average rating for top-10 recommendations from this model: 3.976209912536443
Average rating for bottom-10 (non-)recommendations from this model: 3.3616909620991255

In [25]:
print('Average rating for top-10 recommendations (per user) from this model per configuration')

Average rating for top-10 recommendations (per user) from this model per configuration
warm start:  3.976209912536443


Unfortunately, this formulation doesn't seem to perform as well as the previous one in either case, but it is much faster to add new users.

5. Model with item side information¶

Like before, now fitting the collective model incorporating movie tags, but not user information

5.1 Original version¶

In [26]:
%%time
model_item_info = CMF(k=35, k_main=15, k_item=10, reg_param=1e-3, w_main=10.0, w_item=0.5, random_seed=1)
model_item_info.fit(deepcopy(train),
item_info = deepcopy(item_sideinfo_train))
test_warm_start['Predicted'] = model_item_info.predict(test_warm_start.UserId, test_warm_start.ItemId)

INFO:tensorflow:Optimization terminated with:
Message: b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'
Objective function value: 7.981097
Number of iterations: 229
Number of functions evaluations: 252
CPU times: user 11min 36s, sys: 1min 6s, total: 12min 43s
Wall time: 2min 8s


Same as before, it's possible to add new items to the model without having to refit it entirely from scratch, but this is very slow to do with many items and was not run here for time reasons:

In [27]:
# for i in test_new_items.ItemId.unique():
#     item_vec = item_side_info.loc[item_side_info.ItemId == i]
#     del user_vec['ItemId']
#     item_vec = item_vec.values.reshape((1, -1))
#     model_user_info.add_item(new_id = i, attributes = user_vec)
# test_new_items['Predicted'] = model_item_info.predict(test_new_items.UserId, test_new_items.ItemId)

In [28]:
%%time
model_item_info_all = CMF(k=35, k_main=15, k_item=10, reg_param=1e-3, w_main=10.0, w_item=0.5, random_seed=1)
model_item_info_all.fit(deepcopy(train), item_info = deepcopy(item_side_info))
test_warm_start['PredictedAll'] = model_item_info_all.predict(test_warm_start.UserId, test_warm_start.ItemId)
test_new_items['PredictedAll'] = model_item_info_all.predict(test_new_items.UserId, test_new_items.ItemId)

INFO:tensorflow:Optimization terminated with:
Message: b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'
Objective function value: 7.981080
Number of iterations: 228
Number of functions evaluations: 254
CPU times: user 12min 46s, sys: 1min 9s, total: 13min 56s
Wall time: 2min 25s


This time, I will also try a version that puts more emphasis in correct factorization of the side information:

In [29]:
%%time
model_item_info_diffweight = CMF(k=50, k_main=0, k_item=0, reg_param=1e-3, w_main=5.0, w_item=5.0, random_seed=1)
model_item_info_diffweight.fit(deepcopy(train), item_info = deepcopy(item_side_info))
test_warm_start['PredictedAll2'] = model_item_info_diffweight.predict(test_warm_start.UserId, test_warm_start.ItemId)
test_new_items['PredictedAll2'] = model_item_info_diffweight.predict(test_new_items.UserId, test_new_items.ItemId)

INFO:tensorflow:Optimization terminated with:
Message: b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'
Objective function value: 4.644912
Number of iterations: 84
Number of functions evaluations: 115
CPU times: user 5min 38s, sys: 31.7 s, total: 6min 10s
Wall time: 1min 7s

In [30]:
print("RMSE (item side info, warm start): ", np.sqrt(np.mean( (test_warm_start.Predicted - test_warm_start.Rating)**2) ))
print("RMSE (item side info, warm start, extra items): ", np.sqrt(np.mean( (test_warm_start.PredictedAll - test_warm_start.Rating)**2) ))
print("RMSE (item side info, warm start, extra items, diff. weighting): ", np.sqrt(np.mean( (test_warm_start.PredictedAll2 - test_warm_start.Rating)**2) ))
# print("RMSE (item side info, new items, added afterwards): ", np.sqrt(np.mean( (test_new_items.Predicted - test_new_items.Rating)**2) ))
print("RMSE (item side info, items trained without ratings): ", np.sqrt(np.mean( (test_new_items.PredictedAll - test_new_items.Rating)**2) ))
print("RMSE (item side info, items trained without ratings, diff. weighting): ", np.sqrt(np.mean( (test_new_items.PredictedAll2 - test_new_items.Rating)**2) ))

RMSE (item side info, warm start):  0.8680824066936402
RMSE (item side info, warm start, extra items):  0.868025522651484
RMSE (item side info, warm start, extra items, diff. weighting):  0.9060629960504393
RMSE (item side info, items trained without ratings):  1.0466672746197183
RMSE (item side info, items trained without ratings, diff. weighting):  1.0459891847280063

In [31]:
print("Rho (item side info, warm start): ", np.corrcoef(test_warm_start.Predicted, test_warm_start.Rating)[0][1])
print("Rho (item side info, warm start, extra items): ", np.corrcoef(test_warm_start.PredictedAll, test_warm_start.Rating)[0][1])
print("Rho (item side info, warm start, extra items, diff. weighting): ", np.corrcoef(test_warm_start.PredictedAll2, test_warm_start.Rating)[0][1])
# print("Rho (item side info, new items, added afterwards): ", np.corrcoef(test_new_items.Predicted, test_new_items.Rating)[0][1])
print("Rho (item side info, items trained without ratings): ", np.corrcoef(test_new_items.PredictedAll, test_new_items.Rating)[0][1])
print("Rho (item side info, items trained without ratings, diff. weighting): ", np.corrcoef(test_new_items.PredictedAll2, test_new_items.Rating)[0][1])

Rho (item side info, warm start):  0.642168442991823
Rho (item side info, warm start, extra items):  0.6422664472770928
Rho (item side info, warm start, extra items, diff. weighting):  0.6041898919052913
Rho (item side info, items trained without ratings):  0.3559967656248473
Rho (item side info, items trained without ratings, diff. weighting):  0.35785580363289865

In [32]:
avg_ratings = train.groupby('ItemId')['Rating'].mean().to_frame().rename(columns={"Rating" : "AvgRating"})
test_ = pd.merge(test_warm_start, avg_ratings, left_on='ItemId', right_index=True, how='left')

print('Averge movie rating:', test_.groupby('UserId')['Rating'].mean().mean())
print('Average rating for top-10 rated by each user:', test_.sort_values(['UserId','Rating'], ascending=False).groupby('UserId')['Rating'].head(10).mean())
print('Average rating for bottom-10 rated by each user:', test_.sort_values(['UserId','Rating'], ascending=True).groupby('UserId')['Rating'].head(10).mean())
print('Average rating for top-10 recommendations of best-rated movies:', test_.sort_values(['UserId','AvgRating'], ascending=False).groupby('UserId')['Rating'].head(10).mean())
print('----------------------')
print('Average rating for top-10 recommendations from this model:', test_.sort_values(['UserId','Predicted'], ascending=False).groupby('UserId')['Rating'].head(10).mean())
print('Average rating for bottom-10 (non-)recommendations from this model:', test_.sort_values(['UserId','Predicted'], ascending=True).groupby('UserId')['Rating'].head(10).mean())

Averge movie rating: 3.695626272928932
Average rating for top-10 rated by each user: 4.236997084548105
Average rating for bottom-10 rated by each user: 3.0766763848396503
Average rating for top-10 recommendations of best-rated movies: 3.9557725947521867
----------------------
Average rating for top-10 recommendations from this model: 4.002944606413994
Average rating for bottom-10 (non-)recommendations from this model: 3.333469387755102

In [33]:
print('Average rating for top-10 recommendations (per user) from this model per configuration')
print('warm start, extra items: ', test_warm_start.sort_values(['UserId','PredictedAll'], ascending=False).groupby('UserId')['Rating'].head(10).mean())
print('items trained without ratings: ', test_new_items.sort_values(['UserId','PredictedAll'], ascending=False).groupby('UserId')['Rating'].head(10).mean())
print('items trained without ratings, diff. weighting: ', test_new_items.sort_values(['UserId','PredictedAll2'], ascending=False).groupby('UserId')['Rating'].head(10).mean())

Average rating for top-10 recommendations (per user) from this model per configuration
warm start:  4.002944606413994
warm start, extra items:  4.00265306122449
items trained without ratings:  3.8518708354689903
items trained without ratings, diff. weighting:  3.9085450684630594


The improvement is comparable to that form adding user side information in the warm-start case, but for new items, it seems the recommendations are not as good as for new users. Putting a heaver weight in the movie tags factorization didn't seem to make it perform better in cold-start according to ranking metrics, but it did bring a slight improvement in terms of RMSE.

5.2 Offsets model¶

In [34]:
%%time
model_item_info2 = CMF(k=50, reg_param=1e-4, offsets_model=True, random_seed=1)
model_item_info2.fit(deepcopy(train), item_info = deepcopy(item_sideinfo_pca_train))
test_warm_start['Predicted'] = model_item_info2.predict(test_warm_start.UserId, test_warm_start.ItemId)

INFO:tensorflow:Optimization terminated with:
Objective function value: 0.525121
Number of iterations: 472
Number of functions evaluations: 489
CPU times: user 24min 21s, sys: 2min 27s, total: 26min 49s
Wall time: 5min 22s

In [35]:
%%time
for i in list(test_new_items.ItemId.unique()):
item_vec = deepcopy(item_sideinfo_pca.loc[item_sideinfo_pca.ItemId == i])
del item_vec['ItemId']
item_vec = item_vec.values.reshape((1, -1))
model_item_info2.add_item(new_id = i, attributes = item_vec)
test_new_items['Predicted'] = model_item_info2.predict(test_new_items.UserId, test_new_items.ItemId)

CPU times: user 1.5 s, sys: 40 ms, total: 1.54 s
Wall time: 941 ms

In [36]:
print("RMSE (item side info, warm start): ", np.sqrt(np.mean( (test_warm_start.Predicted - test_warm_start.Rating)**2) ))
print("RMSE (item side info, new items, added afterwards): ", np.sqrt(np.mean( (test_new_items.Predicted - test_new_items.Rating)**2) ))

RMSE (item side info, warm start):  0.9268640553404283
RMSE (item side info, new items, added afterwards):  0.9567803858196762

In [37]:
print("Rho (item side info, warm start): ", np.corrcoef(test_warm_start.Predicted, test_warm_start.Rating)[0][1])
print("Rho (item side info, new items, added afterwards): ", np.corrcoef(test_new_items.Predicted, test_new_items.Rating)[0][1])

Rho (item side info, warm start):  0.5950739127670134
Rho (item side info, new items, added afterwards):  0.5636488072210143

In [38]:
avg_ratings = train.groupby('ItemId')['Rating'].mean().to_frame().rename(columns={"Rating" : "AvgRating"})
test_ = pd.merge(test_warm_start, avg_ratings, left_on='ItemId', right_index=True, how='left')

print('Averge movie rating:', test_.groupby('UserId')['Rating'].mean().mean())
print('Average rating for top-10 rated by each user:', test_.sort_values(['UserId','Rating'], ascending=False).groupby('UserId')['Rating'].head(10).mean())
print('Average rating for bottom-10 rated by each user:', test_.sort_values(['UserId','Rating'], ascending=True).groupby('UserId')['Rating'].head(10).mean())
print('Average rating for top-10 recommendations of best-rated movies:', test_.sort_values(['UserId','AvgRating'], ascending=False).groupby('UserId')['Rating'].head(10).mean())
print('----------------------')
print('Average rating for top-10 recommendations from this model:', test_.sort_values(['UserId','Predicted'], ascending=False).groupby('UserId')['Rating'].head(10).mean())
print('Average rating for bottom-10 (non-)recommendations from this model:', test_.sort_values(['UserId','Predicted'], ascending=True).groupby('UserId')['Rating'].head(10).mean())

Averge movie rating: 3.695626272928932
Average rating for top-10 rated by each user: 4.236997084548105
Average rating for bottom-10 rated by each user: 3.0766763848396503
Average rating for top-10 recommendations of best-rated movies: 3.9557725947521867
----------------------
Average rating for top-10 recommendations from this model: 3.9714285714285715
Average rating for bottom-10 (non-)recommendations from this model: 3.352069970845481

In [39]:
print('Average rating for top-10 recommendations (per user) from this model per configuration')

Average rating for top-10 recommendations (per user) from this model per configuration
warm start:  3.9714285714285715


This alternative formulation doesn't perform as well for warm-start recommendations (in this regards, it seems even worse than when not incorporating it), but it performs significantly better for cold-start.

6. Full model¶

Now a model incorporating both user and item side information, fit to extra users and items without any ratings in the training set. Note that the hyperparameters of this model are a lot harder to tune.

In [40]:
%%time
model_user_item_info = CMF(k=40, k_main=10, k_user=5, k_item=15,
w_main=1.0, w_user=2.0, w_item=0.05,
reg_param=5e-5, random_seed=1)
model_user_item_info.fit(deepcopy(train),
user_info=deepcopy(user_side_info),
item_info=deepcopy(item_side_info),
cols_bin_user=[cl for cl in user_side_info.columns if cl!='UserId'])
test_warm_start['PredictedAll'] = model_user_item_info.predict(test_warm_start.UserId, test_warm_start.ItemId)
test_cold_start['PredictedAll'] = model_user_item_info.predict(test_cold_start.UserId, test_cold_start.ItemId)
test_new_users['PredictedAll'] = model_user_item_info.predict(test_new_users.UserId, test_new_users.ItemId)
test_new_items['PredictedAll'] = model_user_item_info.predict(test_new_items.UserId, test_new_items.ItemId)

INFO:tensorflow:Optimization terminated with:
Objective function value: 0.797726
Number of iterations: 427
Number of functions evaluations: 446
CPU times: user 25min 16s, sys: 2min 8s, total: 27min 25s
Wall time: 5min 43s

In [41]:
print("RMSE (user and item side info, warm start, extra users and items): ", np.sqrt(np.mean( (test_warm_start.PredictedAll - test_warm_start.Rating)**2) ))
print("RMSE (user and item side info, cold start): ", np.sqrt(np.mean( (test_cold_start.PredictedAll - test_cold_start.Rating)**2) ))
print("RMSE (user and item side info, users trained without ratings, extra items): ", np.sqrt(np.mean( (test_new_users.PredictedAll - test_new_users.Rating)**2) ))
print("RMSE (user and item side info, items trained without ratings, extra users): ", np.sqrt(np.mean( (test_new_items.PredictedAll - test_new_items.Rating)**2) ))

RMSE (user and item side info, warm start, extra users and items):  0.8902373699593187
RMSE (user and item side info, cold start):  1.084223622310143
RMSE (user and item side info, users trained without ratings, extra items):  0.9641877735092866
RMSE (user and item side info, items trained without ratings, extra users):  1.0488749599247702

In [42]:
print("Rho (user and item side info, warm start, extra users and items): ", np.corrcoef(test_warm_start.PredictedAll, test_warm_start.Rating)[0][1])
print("Rho (user and item side info, cold start): ", np.corrcoef(test_cold_start.PredictedAll, test_cold_start.Rating)[0][1])
print("Rho (user and item side info, users trained without ratings, extra items): ", np.corrcoef(test_new_users.PredictedAll, test_new_users.Rating)[0][1])
print("Rho (user and item side info, items trained without ratings, extra users): ", np.corrcoef(test_new_items.PredictedAll, test_new_items.Rating)[0][1])

Rho (user and item side info, warm start, extra users and items):  0.6167629307175454
Rho (user and item side info, cold start):  0.32571107573887353
Rho (user and item side info, users trained without ratings, extra items):  0.48530835816378215
Rho (user and item side info, items trained without ratings, extra users):  0.3509155367711825

In [43]:
print('Average rating for top-10 recommendations (per user) from this model per configuration')
print('warm start, extra users and items: ', test_warm_start.sort_values(['UserId','PredictedAll'], ascending=False).groupby('UserId')['Rating'].head(10).mean())
print('users trained without ratings: ', test_new_users.sort_values(['UserId','PredictedAll'], ascending=False).groupby('UserId')['Rating'].head(10).mean())
print('items trained without ratings: ', test_new_items.sort_values(['UserId','PredictedAll'], ascending=False).groupby('UserId')['Rating'].head(10).mean())

Average rating for top-10 recommendations (per user) from this model per configuration
warm start, extra users and items:  3.986997084548105
cold start:  3.9770367120081906
users trained without ratings:  4.329006622516556
items trained without ratings:  4.017231700471065


The user and item side information didn't seem to improve upon the model without that uses only ratings for warm-start recomendations, but it seems to perform very well for cold-start - almost as good as for warm start in fact.

Alternative formulation with the "offsets" model:

In [44]:
%%time
model_user_item_info2 = CMF(k=50, reg_param=5e-3, offsets_model=True, random_seed=1)
model_user_item_info2.fit(deepcopy(train),
user_info=deepcopy(user_sideinfo_train),
item_info=deepcopy(item_sideinfo_pca_train))
test_warm_start['Predicted'] = model_user_item_info2.predict(test_warm_start.UserId, test_warm_start.ItemId)

INFO:tensorflow:Optimization terminated with:
Message: b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'
Objective function value: 0.940455
Number of iterations: 636
Number of functions evaluations: 694
CPU times: user 35min 47s, sys: 2min 46s, total: 38min 33s
Wall time: 8min 34s

In [45]:
%%time
for u in list(np.unique(np.r_[test_new_users.UserId, test_cold_start.UserId])):
user_vec = deepcopy(user_side_info.loc[user_side_info.UserId == u])
del user_vec['UserId']
user_vec = user_vec.values.reshape((1, -1))
model_user_item_info2.add_user(new_id = u, attributes = user_vec)

for i in list(np.unique(np.r_[test_new_items.ItemId.unique(), test_cold_start.ItemId.unique()])):
item_vec = deepcopy(item_sideinfo_pca.loc[item_sideinfo_pca.ItemId == i])
if item_vec.shape[0] > 0:
del item_vec['ItemId']
item_vec = item_vec.values.reshape((1, -1))
model_user_item_info2.add_item(new_id = i, attributes = item_vec)
test_new_users['Predicted'] = model_user_item_info2.predict(test_new_users.UserId, test_new_users.ItemId)
test_new_items['Predicted'] = model_user_item_info2.predict(test_new_items.UserId, test_new_items.ItemId)
test_cold_start['Predicted'] = model_user_item_info2.predict(test_cold_start.UserId, test_cold_start.ItemId)

CPU times: user 3.64 s, sys: 76 ms, total: 3.72 s
Wall time: 2.91 s

In [46]:
print("RMSE (user and item side info, warm start, extra users and items): ", np.sqrt(np.mean( (test_warm_start.Predicted - test_warm_start.Rating)**2) ))
print("RMSE (user and item side info, cold start, users and items added afterwards): ", np.sqrt(np.mean( (test_cold_start.Predicted - test_cold_start.Rating)**2) ))
print("RMSE (user and item side info, users added afterwards): ", np.sqrt(np.mean( (test_new_users.Predicted - test_new_users.Rating)**2) ))
print("RMSE (user and item side info, items added afterwards): ", np.sqrt(np.mean( (test_new_items.Predicted - test_new_items.Rating)**2) ))

RMSE (user and item side info, warm start, extra users and items):  0.9278728563139468
RMSE (user and item side info, cold start, users and items added afterwards):  0.985382884450961
RMSE (user and item side info, users added afterwards):  0.9625519580300425
RMSE (user and item side info, items added afterwards):  0.9547007567277902

In [47]:
print("Rho (user and item side info, warm start, extra users and items): ", np.corrcoef(test_warm_start.Predicted, test_warm_start.Rating)[0][1])
print("Rho (user and item side info, cold start, users and items added afterwards): ", np.corrcoef(test_cold_start.Predicted, test_cold_start.Rating)[0][1])
print("Rho (user and item side info, users added afterwards): ", np.corrcoef(test_new_users.Predicted, test_new_users.Rating)[0][1])
print("Rho (user and item side info, items added afterwards): ", np.corrcoef(test_new_items.Predicted, test_new_items.Rating)[0][1])

Rho (user and item side info, warm start, extra users and items):  0.5778513879855706
Rho (user and item side info, cold start, users and items added afterwards):  0.4375213405664898
Rho (user and item side info, users added afterwards):  0.4834303546281004
Rho (user and item side info, items added afterwards):  0.541600209425479

In [48]:
print('Average rating for top-10 recommendations (per user) from this model per configuration')

Average rating for top-10 recommendations (per user) from this model per configuration
warm start:  3.967142857142857
cold start:  4.011847301448004


This model seems again to perform better for cold-start recommendations.

7. Examining some recomendations¶

Now I'll examine the Top-10 recommended movies for some random users under different models:

In [49]:
from collections import defaultdict

# aggregate statistics
avg_movie_rating = defaultdict(lambda: 0)
num_ratings_per_movie = defaultdict(lambda: 0)
for i in train.groupby('ItemId')['Rating'].mean().to_frame().itertuples():
avg_movie_rating[i.Index] = i.Rating
for i in train.groupby('ItemId')['Rating'].agg(lambda x: len(tuple(x))).to_frame().itertuples():
num_ratings_per_movie[i.Index] = i.Rating

# function to print recommended lists more nicely
def print_reclist(reclist):
list_w_info = [str(m + 1) + ") - " + movie_id_to_title[reclist[m]] +\
" - Average Rating: " + str(np.round(avg_movie_rating[reclist[m]], 2))+\
" - Number of ratings: " + str(num_ratings_per_movie[reclist[m]])\
for m in range(len(reclist))]
print("\n".join(list_w_info))


User with ID = 948 - this user was in the training set:

In [50]:
reclist1 = model_no_side_info.topN(user=948, n=10, exclude_seen=True)
reclist2 = model_user_info_all.topN(user=948, n=10, exclude_seen=True)
reclist3 = model_item_info_all.topN(user=948, n=10, exclude_seen=True)
reclist4 = model_user_item_info.topN(user=948, n=10, exclude_seen=True)
reclist5 = model_user_item_info2.topN(user=948, n=10, exclude_seen=True)

print('Recommendations from ratings-only model:')
print_reclist(reclist1)
print("------")
print('Recommendations from ratings + user demographics model:')
print_reclist(reclist2)
print("------")
print('Recommendations from ratings + movie tags model:')
print_reclist(reclist3)
print("------")
print('Recommendations from ratings + user demographics + movie tags model:')
print_reclist(reclist4)
print("------")
print('Recommendations from ratings + user demographics + movie tags model (alternative formulation):')
print_reclist(reclist5)

Recommendations from ratings-only model:
1) - Raiders of the Lost Ark (1981) - Average Rating: 4.47 - Number of ratings: 1595
2) - Rear Window (1954) - Average Rating: 4.46 - Number of ratings: 668
3) - Wrong Trousers, The (1993) - Average Rating: 4.51 - Number of ratings: 558
4) - Double Indemnity (1944) - Average Rating: 4.4 - Number of ratings: 360
5) - Singin' in the Rain (1952) - Average Rating: 4.27 - Number of ratings: 482
6) - Seven Samurai (The Magnificent Seven) (Shichinin no samurai) (1954) - Average Rating: 4.54 - Number of ratings: 404
7) - M (1931) - Average Rating: 4.32 - Number of ratings: 196
8) - Beauty and the Beast (1991) - Average Rating: 3.84 - Number of ratings: 658
9) - Third Man, The (1949) - Average Rating: 4.46 - Number of ratings: 308
10) - Schindler's List (1993) - Average Rating: 4.5 - Number of ratings: 1475
------
Recommendations from ratings + user demographics model:
1) - Raiders of the Lost Ark (1981) - Average Rating: 4.47 - Number of ratings: 1595
2) - Rear Window (1954) - Average Rating: 4.46 - Number of ratings: 668
3) - Singin' in the Rain (1952) - Average Rating: 4.27 - Number of ratings: 482
4) - Wrong Trousers, The (1993) - Average Rating: 4.51 - Number of ratings: 558
5) - Beauty and the Beast (1991) - Average Rating: 3.84 - Number of ratings: 658
6) - Double Indemnity (1944) - Average Rating: 4.4 - Number of ratings: 360
7) - M (1931) - Average Rating: 4.32 - Number of ratings: 196
8) - Seven Samurai (The Magnificent Seven) (Shichinin no samurai) (1954) - Average Rating: 4.54 - Number of ratings: 404
9) - Invasion of the Body Snatchers (1956) - Average Rating: 3.89 - Number of ratings: 395
10) - Third Man, The (1949) - Average Rating: 4.46 - Number of ratings: 308
------
Recommendations from ratings + movie tags model:
1) - Schindler's List (1993) - Average Rating: 4.5 - Number of ratings: 1475
2) - Raiders of the Lost Ark (1981) - Average Rating: 4.47 - Number of ratings: 1595
3) - Beauty and the Beast (1991) - Average Rating: 3.84 - Number of ratings: 658
4) - Wrong Trousers, The (1993) - Average Rating: 4.51 - Number of ratings: 558
5) - Singin' in the Rain (1952) - Average Rating: 4.27 - Number of ratings: 482
6) - Rear Window (1954) - Average Rating: 4.46 - Number of ratings: 668
7) - Double Indemnity (1944) - Average Rating: 4.4 - Number of ratings: 360
8) - Seven Samurai (The Magnificent Seven) (Shichinin no samurai) (1954) - Average Rating: 4.54 - Number of ratings: 404
9) - M (1931) - Average Rating: 4.32 - Number of ratings: 196
10) - Third Man, The (1949) - Average Rating: 4.46 - Number of ratings: 308
------
Recommendations from ratings + user demographics + movie tags model:
1) - Boys Don't Cry (1999) - Average Rating: 3.98 - Number of ratings: 534
2) - African Queen, The (1951) - Average Rating: 4.26 - Number of ratings: 663
3) - Singin' in the Rain (1952) - Average Rating: 4.27 - Number of ratings: 482
4) - Schindler's List (1993) - Average Rating: 4.5 - Number of ratings: 1475
5) - Raiders of the Lost Ark (1981) - Average Rating: 4.47 - Number of ratings: 1595
6) - Grand Illusion (Grande illusion, La) (1937) - Average Rating: 4.32 - Number of ratings: 113
7) - M (1931) - Average Rating: 4.32 - Number of ratings: 196
8) - Amadeus (1984) - Average Rating: 4.25 - Number of ratings: 884
9) - Babe (1995) - Average Rating: 3.87 - Number of ratings: 1095
10) - Shadow of a Doubt (1943) - Average Rating: 4.27 - Number of ratings: 145
------
Recommendations from ratings + user demographics + movie tags model (alternative formulation):
1) - Best Years of Our Lives, The (1946) - Average Rating: 0 - Number of ratings: 0
2) - City Lights (1931) - Average Rating: 0 - Number of ratings: 0
3) - Children of Heaven, The (Bacheha-Ye Aseman) (1997) - Average Rating: 4.24 - Number of ratings: 42
4) - Glory (1989) - Average Rating: 0 - Number of ratings: 0
5) - Schindler's List (1993) - Average Rating: 4.5 - Number of ratings: 1475
6) - Shawshank Redemption, The (1994) - Average Rating: 4.54 - Number of ratings: 1394
7) - Paris Was a Woman (1995) - Average Rating: 2.0 - Number of ratings: 2
8) - Great Escape, The (1963) - Average Rating: 0 - Number of ratings: 0
9) - Central Station (Central do Brasil) (1998) - Average Rating: 0 - Number of ratings: 0
10) - Modern Times (1936) - Average Rating: 0 - Number of ratings: 0


User with ID = 1 - this user was not in the training set:

In [51]:
# reclist1 = model_no_side_info.topN(user=1, n=10) # not possible with this model
reclist2 = model_user_info_all.topN(user=1, n=10)
# reclist3 = model_item_info_all.topN(user=1, n=10) # not possible with this model
reclist4 = model_user_item_info.topN(user=1, n=10)
reclist5 = model_user_item_info2.topN(user=1, n=10)

# print('Recommendations from ratings-only model:')
# print_reclist(reclist1)
# print("------")
print('Recommendations from ratings + user demographics model:')
print_reclist(reclist2)
# print("------")
# print('Recommendations from ratings + movie tags model:')
# print_reclist(reclist3)
print("------")
print('Recommendations from ratings + user demographics + movie tags model:')
print_reclist(reclist4)
print("------")
print('Recommendations from ratings + user demographics + movie tags model (alternative formulation):')
print_reclist(reclist5)

Recommendations from ratings + user demographics model:
1) - Shawshank Redemption, The (1994) - Average Rating: 4.54 - Number of ratings: 1394
2) - Schindler's List (1993) - Average Rating: 4.5 - Number of ratings: 1475
3) - Usual Suspects, The (1995) - Average Rating: 4.53 - Number of ratings: 1116
4) - Seven Samurai (The Magnificent Seven) (Shichinin no samurai) (1954) - Average Rating: 4.54 - Number of ratings: 404
5) - Godfather, The (1972) - Average Rating: 4.54 - Number of ratings: 1414
6) - Wrong Trousers, The (1993) - Average Rating: 4.51 - Number of ratings: 558
7) - Close Shave, A (1995) - Average Rating: 4.5 - Number of ratings: 415
8) - Raiders of the Lost Ark (1981) - Average Rating: 4.47 - Number of ratings: 1595
9) - Rear Window (1954) - Average Rating: 4.46 - Number of ratings: 668
10) - Sixth Sense, The (1999) - Average Rating: 4.4 - Number of ratings: 1560
------
Recommendations from ratings + user demographics + movie tags model:
1) - Schindler's List (1993) - Average Rating: 4.5 - Number of ratings: 1475
2) - Shawshank Redemption, The (1994) - Average Rating: 4.54 - Number of ratings: 1394
3) - Life Is Beautiful (La Vita � bella) (1997) - Average Rating: 4.33 - Number of ratings: 731
4) - Close Shave, A (1995) - Average Rating: 4.5 - Number of ratings: 415
5) - Seven Samurai (The Magnificent Seven) (Shichinin no samurai) (1954) - Average Rating: 4.54 - Number of ratings: 404
6) - Sanjuro (1962) - Average Rating: 4.69 - Number of ratings: 45
7) - Wrong Trousers, The (1993) - Average Rating: 4.51 - Number of ratings: 558
8) - Roman Holiday (1953) - Average Rating: 4.26 - Number of ratings: 260
9) - Sunset Blvd. (a.k.a. Sunset Boulevard) (1950) - Average Rating: 4.48 - Number of ratings: 291
10) - Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (1963) - Average Rating: 4.44 - Number of ratings: 881
------
Recommendations from ratings + user demographics + movie tags model (alternative formulation):
1) - Shawshank Redemption, The (1994) - Average Rating: 4.54 - Number of ratings: 1394
2) - City Lights (1931) - Average Rating: 0 - Number of ratings: 0
3) - Schindler's List (1993) - Average Rating: 4.5 - Number of ratings: 1475
4) - Charade (1963) - Average Rating: 4.16 - Number of ratings: 177
5) - Usual Suspects, The (1995) - Average Rating: 4.53 - Number of ratings: 1116
6) - Central Station (Central do Brasil) (1998) - Average Rating: 0 - Number of ratings: 0
7) - Modern Times (1936) - Average Rating: 0 - Number of ratings: 0
8) - Children of Heaven, The (Bacheha-Ye Aseman) (1997) - Average Rating: 4.24 - Number of ratings: 42
9) - Foreign Correspondent (1940) - Average Rating: 0 - Number of ratings: 0
10) - Thin Man, The (1934) - Average Rating: 0 - Number of ratings: 0


As seen from these lists, the alternative formulation of the model tends to recommend more movies that were not in the training set, which in many contexts would be a desirable thing despite the slightly lower achieved metrics.