In [1]:

%pylab inline
import pandas as pd

import numpy as np
from __future__ import division
import itertools

import matplotlib.pyplot as plt
import seaborn as sns

import logging
logger = logging.getLogger()

Populating the interactive namespace from numpy and matplotlib

9 Recommendation Systems¶

two broad groups:

Content-based systems
focus on the properities of items.
Collaborative filtering systems
focus on the relationship between users and items.

9.1 A Model for Recommendation Systems¶

The Utility Matrix¶

record the preference given by users for certain items.

In [66]:

# Example 9.1
M = pd.DataFrame(index=['A', 'B', 'C', 'D'], columns=['HP1', 'HP2', 'HP3', 'TW', 'SW1', 'SW2', 'SW3'])

M.loc['A', ['HP1', 'TW', 'SW1']] = [4, 5, 1]
M.iloc[1, 0:3] = [5, 5, 4]
M.iloc[2, 3:-1] = [2, 4, 5]
M.iloc[3, [1, -1]] = [3, 3]

M_9_1 = M
M_9_1

Out[66]:

	HP1	HP2	HP3	TW	SW1	SW2	SW3
A	4	NaN	NaN	5	1	NaN	NaN
B	5	5	4	NaN	NaN	NaN	NaN
C	NaN	NaN	NaN	2	4	5	NaN
D	NaN	3	NaN	NaN	NaN	NaN	3

In practice, the matrix would be even sparser, with the typical user rating only a tiny fraction of all avalibale items.

the goal of a recommendation system is: to predict the blanks in the utility matrix.

slightly difference in many application:
- predict every blank entry $<$ discover some potential entries in each row.
- find all items with the highest expected ratings $<$ find a large subset of those.

The Long Tail¶

physical institutions	online institutions
provide only the most popular items	provide the entire range of items

the long tail force online institutions to recommend items to individual users:

It's no possible to present all avaliable items to the user.
Neither can we expect users to have heared of each of the items they might like.

Applications of Recommendation Systems¶

Product Recommendations
Movie Recommendations
News Articles

Populating the Utility Matrix¶

how to discovery the value users place on items:

We can ask users to rate items.
cons: users are unwilling to do, and so samples are biased by very little fraction of peoples.
We can make inferences from users' behavior.
eg: items purchased/viewed/rated.

9.2 Content-Based Recommendations¶

9.2.1 Item Profiles¶

a record representing important characteristics of items.

Discovering Features¶

for Documents
idea: find the identification of words that characterize the topic of a document.
namely, we expect a sets of words to express the subjects or main ideas of the document.
1. eliminate stop words.
2. compute the TF.DIF score for each reamining word in the document.
3. take as the features of a document the $n$ words with the highest TF.DIF scores.
to measure the similarity of two documents, the distance measures we could use are:
1. Jaccard distance
2. cosine distance
  cosine distance of vectors is not affected by components in which both vectors have 0.
for Images
invite users to tag the items.
cons: users are unwilling to do $\implies$ there are not enough tags (bias).

generalize feature vector¶

feature is discrete. $\to$ boolean value.
feature is numerical. $\to$ normalization.

9.2.5 User Profiles¶

create vectors with the same components of item profiles to describe the user's preferences.

It could be derived from utility matrix and item profiles.

normalizate untility matrix. ($[-1,1]$ for cosine distance).
value in user profiles = utility value * corresponding item vectors.

In [3]:

# example 9.4

users_name = ['U', 'V']
items_name = ['F{}'.format(x) for x in range(4)]
features_name = ['Julia Roberts', 'others']

# utility matrix
M_uti = pd.DataFrame([
                        [3, 4, 5, 0],
                        [6, 2, 3, 5]  
                     ],
                     index=users_name,
                     columns=items_name
                    )

M_uti

Out[3]:

	F0	F1	F2	F3
U	3	4	5	0
V	6	2	3	5

In [4]:

# item profile
M_item = pd.DataFrame(index=items_name, columns=features_name)

M_item.loc[:, features_name[0]] = 1
M_item = M_item.fillna(value=0)

M_item

Out[4]:

	Julia Roberts	others
F0	1	0
F1	1	0
F2	1	0
F3	1	0

In [5]:

M_uti.apply(lambda x: x - np.mean(x), axis=1)

Out[5]:

	F0	F1	F2	F3
U	0	1	2	-3
V	2	-2	-1	1

In [6]:

M_user = M_uti.fillna(value=0).dot(M_item) / 4 #average = sum/len
M_user

Out[6]:

	Julia Roberts	others
U	3	0
V	4	0

9.2.6 Recommending Items to Users Based on Content¶

to estimate:
$$M_{utility}[user, item] = cosineDistant(M_{user}, M_{item})$$

the more similar, the higher probility to recommend.
classification algorithms:
Recommend or Not (machine learning):
one decision per user $\to$ take too long time to construct.
be used only for relatively small problem size.

In [7]:

# exercise 9.2.1

raw_data = [
        [3.06, 2.68, 2.92],
        [500, 320, 640],
        [6, 4, 6]
    ]

M_item = pd.DataFrame(raw_data, index=['Processor Speed', 'Disk Size', 'Main-Memory Size'], columns=['A', 'B', 'C'])

# items: A, B, C; features: Processor Speed, Disk Size, ...
M_item

Out[7]:

	A	B	C
Processor Speed	3.06	2.68	2.92
Disk Size	500.00	320.00	640.00
Main-Memory Size	6.00	4.00	6.00

In [12]:

# exercise 9.2.1 

# (d)
M_item.apply(lambda x: x / np.mean(x), axis=1)

Out[12]:

	A	B	C
Processor Speed	1.060046	0.928406	1.011547
Disk Size	1.027397	0.657534	1.315068
Main-Memory Size	1.125000	0.750000	1.125000

In [13]:

# exercise 9.2.2 

# (a)
M_item.apply(lambda x: x - np.mean(x), axis=1)

Out[13]:

	A	B	C
Processor Speed	0.173333	-0.206667	0.033333
Disk Size	13.333333	-166.666667	153.333333
Main-Memory Size	0.666667	-1.333333	0.666667

In [30]:

# exercise 9.2.3

M_uti = pd.DataFrame([[4, 2, 5]], index=['user'], columns=['A', 'B', 'C'])
M_uti

Out[30]:

	A	B	C
user	4	2	5

In [31]:

# (a)
M_uti_nor = M_uti.apply(lambda x: x - np.mean(x), axis=1)
M_uti_nor

Out[31]:

	A	B	C
user	0.333333	-1.666667	1.333333

In [32]:

# (b)
M_user = M_item.dot(M_uti_nor.T) / 3
M_user

Out[32]:

	user
Processor Speed	0.148889
Disk Size	162.222222
Main-Memory Size	1.111111

In [79]:

logger.setLevel('WARN')

def create_user_profile(utility_matrix, item_profile):
    """Create user profile by combining utility matrix with item profile in 9.2.5 ."""
    
    assert np.array_equal(utility_matrix.columns, item_profile.columns), \
                      "utility matrix should keep same columns name with item profile."
    
    logger.info('utility_matrix: \n{}\n'.format(utility_matrix))  
    M_uti_notnull = np.ones(utility_matrix.shape)
    M_uti_notnull[utility_matrix.isnull().values] = 0
    logger.info('utility_matrix_isnull: \n{}\n'.format(M_uti_notnull))
    
    
    logger.info('utility_matrix: \n{}\n'.format(item_profile)) 
    M_item_notnull = np.ones(item_profile.shape)
    M_item_notnull[item_profile.isnull().values] = 0
    logger.info('utility_matrix_isnull: \n{}\n'.format(M_item_notnull))
    
    utility_matrix = utility_matrix.fillna(value=0)
    item_profile = item_profile.fillna(value=0)
    M_user = item_profile.dot(utility_matrix.T).values / np.dot(M_item_notnull, M_uti_notnull.T)
    M_user[np.isinf(M_user)] = np.nan   # solve: divide zero
    logger.info('M_user: \n{}\n'.format(M_user))
    
    return pd.DataFrame(M_user, index=item_profile.index, columns=utility_matrix.index)


M_uti = pd.DataFrame([[4, 2, 5], [1, np.nan, 3]], index=['userA', 'userB'], columns=['A', 'B', 'C'])
M_uti_nor = M_uti.apply(lambda x: x - np.mean(x), axis=1)
print('utility matrix: \n{}\n'.format(M_uti_nor))
print('item profile: \n{}\n'.format(M_item))

create_user_profile(M_uti_nor, M_item)

utility matrix: 
              A         B         C
userA  0.333333 -1.666667  1.333333
userB -1.000000       NaN  1.000000

item profile: 
                       A       B       C
Processor Speed     3.06    2.68    2.92
Disk Size         500.00  320.00  640.00
Main-Memory Size    6.00    4.00    6.00

Out[79]:

	userA	userB
Processor Speed	0.148889	-0.07
Disk Size	162.222222	70.00
Main-Memory Size	1.111111	0.00

9.3 Collaborative Filtering¶

identifying similar users and recommending what similar users like.

refine data¶

rounding the data
eg: rates of 3, 4 and 5 are "1", otherwise "0".
normalizing rates
subtracting from each rating the average rating of that user.

In [68]:

# Fig 9.4
M_9_1

Out[68]:

	HP1	HP2	HP3	TW	SW1	SW2	SW3
A	4	NaN	NaN	5	1	NaN	NaN
B	5	5	4	NaN	NaN	NaN	NaN
C	NaN	NaN	NaN	2	4	5	NaN
D	NaN	3	NaN	NaN	NaN	NaN	3

In [70]:

# rounding the data
M_round = M_9_1.copy()
M_round[M_9_1 <= 2] = np.nan
M_round[M_9_1 > 2] = 1

M_round

Out[70]:

	HP1	HP2	HP3	TW	SW1	SW2	SW3
A	1	NaN	NaN	1	NaN	NaN	NaN
B	1	1	1	NaN	NaN	NaN	NaN
C	NaN	NaN	NaN	NaN	1	1	NaN
D	NaN	1	NaN	NaN	NaN	NaN	1

In [75]:

# normalizing ratings
M_norm = M_9_1.apply(lambda x: x - np.mean(x), axis=1)
M_norm

Out[75]:

	HP1	HP2	HP3	TW	SW1	SW2	SW3
A	0.666667	NaN	NaN	1.666667	-2.333333	NaN	NaN
B	0.333333	0.333333	-0.666667	NaN	NaN	NaN	NaN
C	NaN	NaN	NaN	-1.666667	0.333333	1.333333	NaN
D	NaN	0.000000	NaN	NaN	NaN	NaN	0

9.3.2 The Duality of Similarity¶

We can use information about users to recommend items, whereas even if we find pairs of similar items, it takes an additional step in order to recommend items to users.
- find $n$ similar users $\to$ recommend item $I$ to user $U$.
  normalize the utility matrix first.
  \begin{align} M[U,I] &= Ave(M[U,:]) + Ave(M[0:n,I] - Ave(M[0:n,I])) \\ &\approx Ave(M[U,:]) + Std(M[0:n,I]) \end{align}
- find $m$ similar items $\to$ recommend item $I$ to user $U$.
  $$M[U,I] = Ave(M[U,0:m])$$
- in order to recommend items to user $U$, we need to find all or most of entry in $M[U,:]$.
  tradeoff:
  1. user-item: find similar users, directly get all predict values of all potent items.
    item-item: find similar items, we need calculate all items one by one (additional step) to fill $M[U,:]$.
  2. item-item similarity often provides more reliable information due to the simplicity of items (genre).
- precompute preferred items for each user.
  utility matrix evolves slowly $\implies$ compute it infrequently and assume that it remains fixed between recomputations.
Items tend to be classifiable in simple terms (eg: genre), whereas the individuals are complex.

9.3.3 Clustering Users and Items¶

Hierachical approach is prefered:

leave many cluster unmerged at first.
cluster items, and average the corresponding value in utility matrix.
cluster users, and average as well.
repeat several times if we like.

Predict $M[U,I]:

$U \in C$, and $I \in D$.
predict:
\begin{equation} M[U,I] = \begin{cases} M_{revised}[C,D] & \quad \text{if existed.} \\ \text{estimate using similar users/items} & \quad \text{otherwise} \end{cases} \end{equation}

In [108]:

# Fig 9.8

raw_data = [
    [4, 5, np.nan, 5, 1, np.nan, 3, 2],
    [np.nan, 3, 4, 3, 1, 2, 1, np.nan],
    [2, np.nan, 1, 3, np.nan, 4, 5, 3]
]

import string
M_uti = pd.DataFrame(raw_data, index=list(string.uppercase[:3]), columns=list(string.lowercase[:8]))
M_uti

Out[108]:

	a	b	c	d	e	f	g	h
A	4	5	NaN	5	1	NaN	3	2
B	NaN	3	4	3	1	2	1	NaN
C	2	NaN	1	3	NaN	4	5	3

In [140]:

logger.setLevel('WARN')
# exercise 9.3.1
from scipy.spatial.distance import jaccard, cosine
from itertools import combinations


def calc_distance_among_matrix(M, func_dis):
    for c in list(combinations(M.index, 2)):
        logger.info('c: {}'.format(c))
        u, v = M.loc[c[0]], M.loc[c[1]]
        logger.info('\n u:{},\n v:{}\n'.format(u.values,v.values))
        print('{} {}: {}'.format(c, func_dis.__name__, func_dis(u,v)))

# (a)
calc_distance_among_matrix(M_uti.notnull(), jaccard)    

('A', 'B') jaccard: 0.5
('A', 'C') jaccard: 0.5
('B', 'C') jaccard: 0.5

In [141]:

# (b)
calc_distance_among_matrix(M_uti.fillna(value=0), cosine)

('A', 'B') cosine: 0.398959235991
('A', 'C') cosine: 0.385081306188
('B', 'C') cosine: 0.486129880223

In [142]:

# (c)
M_tmp = M_uti.copy()
M_tmp[M_uti < 3] = 0
M_tmp[M_uti >= 3] = 1 
calc_distance_among_matrix(M_tmp, jaccard)

('A', 'B') jaccard: 0.714285714286
('A', 'C') jaccard: 0.75
('B', 'C') jaccard: 0.875

In [144]:

# (d)
calc_distance_among_matrix(M_tmp.fillna(value=0), cosine)

('A', 'B') cosine: 0.42264973081
('A', 'C') cosine: 0.5
('B', 'C') cosine: 0.711324865405

In [146]:

# (e)
M_uti_nor = M_uti.apply(lambda x: x - np.mean(x), axis=1)
M_uti_nor

Out[146]:

	a	b	c	d	e	f	g	h
A	0.666667	1.666667	NaN	1.666667	-2.333333	NaN	-0.333333	-1.333333
B	NaN	0.666667	1.666667	0.666667	-1.333333	-0.333333	-1.333333	NaN
C	-1.000000	NaN	-2.000000	0.000000	NaN	1.000000	2.000000	0.000000

In [148]:

# (f)
calc_distance_among_matrix(M_uti_nor.fillna(value=0), cosine)

('A', 'B') cosine: 0.415693452532
('A', 'C') cosine: 1.11547005384
('B', 'C') cosine: 1.73957399695

In [149]:

# exercise 9.3.2
#todo

9.4 Dimensionality Reduction¶

UV-decomposition: $$M = U \times V$$

measure: RMSE (root-mean-square error)

Building a Complete UV-Decomposition Algorithm¶

Preprocessing of the matrix $M$.
normalization:
1. subtract: average rating of user $i$, then average rating of item $j$.
2. subtract: first item, then user.
3. subtract: half of average of item and half of average of user.
Initializing $U$ and $V$.
choice: gives the elements of $UV$ the average of the nonblank elements of $M$.
$\implies$ the element of $U$ and $V$ should be $\sqrt{a/d}$,
where $a$ is the average nonblank element of $M$, $d$ is the lengths of the short sides of $U$ and $V$.

local minima contains global minima:
1. vary the initial values of $U$ and $V$:
  perturb the value $\sqrt{a/d}$ randomly.
2. vary the way we seek the optimum.
Performing the Optimization.
different optimization path:
choose a permutation of the elements and follow that order for every round.

Gradient Descent $\to$ stochastic gradient descent.
Converging to a Minimum.
track the amount of improvement in the RMSE obtained.

stop condition:
1. stop when that improvement in one round falls below a threshold.
2. stop when the maximum improvement during a round is below a threshold.

Avoiding Overfitting¶

solutions:

optimized by only moving the value of a component a fraction of the way from its current value toward its optimized value.
Stop before the process has converged.
Take several different $UV$ decompositions, and average their predicts.

In [ ]:

# exercise 9.4.6

#todo

9.5 The NetFlix Challenge¶

some facts:

CineMatch was not a very good algorithm.
UV-decomposition alorithm given a 7% improvement over CineMatch when couped with normalization and a few other tricks.
Combing different algorithms is a preferred strategy.
Genre and other information in IMDB was no useful.
Time of rating turned out to be useful: upward or downward slope with time.

todo: read the papers introduced in the chapter.

	HP1	HP2	HP3	TW	SW1	SW2	SW3
A	4	NaN	NaN	5	1	NaN	NaN
B	5	5	4	NaN	NaN	NaN	NaN
C	NaN	NaN	NaN	2	4	5	NaN
D	NaN	3	NaN	NaN	NaN	NaN	3

	HP1	HP2	HP3	TW	SW1	SW2	SW3
A	4	NaN	NaN	5	1	NaN	NaN
B	5	5	4	NaN	NaN	NaN	NaN
C	NaN	NaN	NaN	2	4	5	NaN
D	NaN	3	NaN	NaN	NaN	NaN	3

	HP1	HP2	HP3	TW	SW1	SW2	SW3
A	1	NaN	NaN	1	NaN	NaN	NaN
B	1	1	1	NaN	NaN	NaN	NaN
C	NaN	NaN	NaN	NaN	1	1	NaN
D	NaN	1	NaN	NaN	NaN	NaN	1

	HP1	HP2	HP3	TW	SW1	SW2	SW3
A	4	NaN	NaN	5	1	NaN	NaN
B	5	5	4	NaN	NaN	NaN	NaN
C	NaN	NaN	NaN	2	4	5	NaN
D	NaN	3	NaN	NaN	NaN	NaN	3

	HP1	HP2	HP3	TW	SW1	SW2	SW3
A	4	NaN	NaN	5	1	NaN	NaN
B	5	5	4	NaN	NaN	NaN	NaN
C	NaN	NaN	NaN	2	4	5	NaN
D	NaN	3	NaN	NaN	NaN	NaN	3

	HP1	HP2	HP3	TW	SW1	SW2	SW3
A	1	NaN	NaN	1	NaN	NaN	NaN
B	1	1	1	NaN	NaN	NaN	NaN
C	NaN	NaN	NaN	NaN	1	1	NaN
D	NaN	1	NaN	NaN	NaN	NaN	1

	HP1	HP2	HP3	TW	SW1	SW2	SW3
A	4	NaN	NaN	5	1	NaN	NaN
B	5	5	4	NaN	NaN	NaN	NaN
C	NaN	NaN	NaN	2	4	5	NaN
D	NaN	3	NaN	NaN	NaN	NaN	3

	HP1	HP2	HP3	TW	SW1	SW2	SW3
A	4	NaN	NaN	5	1	NaN	NaN
B	5	5	4	NaN	NaN	NaN	NaN
C	NaN	NaN	NaN	2	4	5	NaN
D	NaN	3	NaN	NaN	NaN	NaN	3

	HP1	HP2	HP3	TW	SW1	SW2	SW3
A	1	NaN	NaN	1	NaN	NaN	NaN
B	1	1	1	NaN	NaN	NaN	NaN
C	NaN	NaN	NaN	NaN	1	1	NaN
D	NaN	1	NaN	NaN	NaN	NaN	1