Graphs and Networks

Lesson V - Recommender Systems

Bruno Gonçalves
www.data4sci.com
@bgoncalves, @data4sci

In [1]:

from collections import Counter
from pprint import pprint

import numpy as np

import matplotlib
import matplotlib.pyplot as plt 

import watermark

%load_ext watermark
%matplotlib inline

Watermark the notebook with current versions of all loaded libraries

In [2]:

%watermark -n -v -m -g -iv

Python implementation: CPython
Python version       : 3.8.5
IPython version      : 7.19.0

Compiler    : Clang 10.0.0 
OS          : Darwin
Release     : 21.6.0
Machine     : x86_64
Processor   : i386
CPU cores   : 16
Architecture: 64bit

Git hash: e2a121886d5fb3c11344fccd2cf35c76180f7246

matplotlib: 3.3.2
numpy     : 1.19.2
watermark : 2.1.0
json      : 2.0.9

Load default figure style

In [3]:

plt.style.use('./d4sci.mplstyle')

Let's use this simple example, where everything is easy to visualize

We start by defining the adjacency matrix of our bipartite network. This is not the most efficient graph representation, but it is the most convenient in our case

In [4]:

A = np.zeros((8, 6), dtype='int')

Rows correspond to 'x' nodes and columns to 'y' nodes

In [5]:

A[0, 0]=1
A[0, 1]=1
A[1, 0]=1
A[1, 1]=1
A[1, 3]=1
A[2, 2]=1
A[2, 4]=1
A[3, 0]=1
A[3, 3]=1
A[4, 2]=1
A[4, 4]=1
A[5, 2]=1
A[5, 5]=1
A[6, 4]=1 
A[6, 5]=1
A[7, 4]=1

The adjacency matrix is then:

In [6]:

pprint(A)

array([[1, 1, 0, 0, 0, 0],
       [1, 1, 0, 1, 0, 0],
       [0, 0, 1, 0, 1, 0],
       [1, 0, 0, 1, 0, 0],
       [0, 0, 1, 0, 1, 0],
       [0, 0, 1, 0, 0, 1],
       [0, 0, 0, 0, 1, 1],
       [0, 0, 0, 0, 1, 0]])

In [7]:

A.shape

Out[7]:

(8, 6)

And we can easily calculate the degree of each kind of nodes as well, by simply summing over the rows or columns:

In [8]:

kx = A.sum(axis=1)
ky = A.sum(axis=0)

In [9]:

kx

Out[9]:

array([2, 3, 2, 2, 2, 2, 2, 1])

In [10]:

ky

Out[10]:

array([3, 2, 3, 2, 4, 2])

The X and Y one-mode projections are:

In [11]:

X = np.dot(A, A.T)
Y = np.dot(A.T, A)

In [12]:

pprint(X)

array([[2, 2, 0, 1, 0, 0, 0, 0],
       [2, 3, 0, 2, 0, 0, 0, 0],
       [0, 0, 2, 0, 2, 1, 1, 1],
       [1, 2, 0, 2, 0, 0, 0, 0],
       [0, 0, 2, 0, 2, 1, 1, 1],
       [0, 0, 1, 0, 1, 2, 1, 0],
       [0, 0, 1, 0, 1, 1, 2, 1],
       [0, 0, 1, 0, 1, 0, 1, 1]])

In [13]:

pprint(Y)

array([[3, 2, 0, 2, 0, 0],
       [2, 2, 0, 1, 0, 0],
       [0, 0, 3, 0, 2, 1],
       [2, 1, 0, 2, 0, 0],
       [0, 0, 2, 0, 4, 1],
       [0, 0, 1, 0, 1, 2]])

And we can see that the y-projection neatly splits into two disconnected graphs, as expected

In [14]:

order = [0, 1, 3, 2, 4, 5]
Y[order, :][:, order]

Out[14]:

array([[3, 2, 2, 0, 0, 0],
       [2, 2, 1, 0, 0, 0],
       [2, 1, 2, 0, 0, 0],
       [0, 0, 0, 3, 2, 1],
       [0, 0, 0, 2, 4, 1],
       [0, 0, 0, 1, 1, 2]])

Similarity¶

Let us definie the similarity between two users (X) or items (Y) to simply be the fraction of edges user x shares with user y. For convenience, we supply the one-mode X projection directly

In [15]:

def similarity(X, kx):
    S = X.copy().astype('float')
    
    for i in range(X.shape[0]):
        for j in range(X.shape[1]):
            S[i, j]/= np.min([kx[i], kx[j]])

    return S

Our similarity is then:

In [16]:

S = similarity(X, kx)
print(S)

[[1.  1.  0.  0.5 0.  0.  0.  0. ]
 [1.  1.  0.  1.  0.  0.  0.  0. ]
 [0.  0.  1.  0.  1.  0.5 0.5 1. ]
 [0.5 1.  0.  1.  0.  0.  0.  0. ]
 [0.  0.  1.  0.  1.  0.5 0.5 1. ]
 [0.  0.  0.5 0.  0.5 1.  0.5 0. ]
 [0.  0.  0.5 0.  0.5 0.5 1.  1. ]
 [0.  0.  1.  0.  1.  0.  1.  1. ]]

Naturally, this symmilarity metric is symmetric

In [17]:

(S-S.T).mean()

Out[17]:

0.0

Now we can predict scores for all user/item pairs. The score for each user-item pair will be the average similarity of all users that have

In [18]:

def predict_score(A, S):
    v = np.dot(S, A)
    norms = S.sum(axis=0)-np.diag(S)
    
    v = v/norms.reshape(-1,1)

    return v

The predicted scores are then:

In [19]:

v = predict_score(A, S)
print(v.round(2))

[[1.67 1.33 0.   1.   0.   0.  ]
 [1.5  1.   0.   1.   0.   0.  ]
 [0.   0.   0.83 0.   1.17 0.33]
 [1.67 1.   0.   1.33 0.   0.  ]
 [0.   0.   0.83 0.   1.17 0.33]
 [0.   0.   1.33 0.   1.   1.  ]
 [0.   0.   0.6  0.   1.2  0.6 ]
 [0.   0.   0.67 0.   1.33 0.33]]

As we can see, we not only have scores for the items that each user already rated, but also for other items as well:

In [20]:

np.transpose(np.nonzero(np.sign(v)-A)) # Get the coordinates of the non-zero elements

Out[20]:

array([[0, 3],
       [2, 5],
       [3, 1],
       [4, 5],
       [5, 4],
       [6, 2],
       [7, 2],
       [7, 5]])

From this matrix we would know to recommend y4 to x1, y6 to x3, etc