from collections import Counter
from pprint import pprint
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import watermark
%load_ext watermark
%matplotlib inline
Watermark the notebook with current versions of all loaded libraries
%watermark -n -v -m -g -iv
Python implementation: CPython Python version : 3.8.5 IPython version : 7.19.0 Compiler : Clang 10.0.0 OS : Darwin Release : 21.6.0 Machine : x86_64 Processor : i386 CPU cores : 16 Architecture: 64bit Git hash: e2a121886d5fb3c11344fccd2cf35c76180f7246 matplotlib: 3.3.2 numpy : 1.19.2 watermark : 2.1.0 json : 2.0.9
Load default figure style
plt.style.use('./d4sci.mplstyle')
Let's use this simple example, where everything is easy to visualize
We start by defining the adjacency matrix of our bipartite network. This is not the most efficient graph representation, but it is the most convenient in our case
A = np.zeros((8, 6), dtype='int')
Rows correspond to 'x' nodes and columns to 'y' nodes
A[0, 0]=1
A[0, 1]=1
A[1, 0]=1
A[1, 1]=1
A[1, 3]=1
A[2, 2]=1
A[2, 4]=1
A[3, 0]=1
A[3, 3]=1
A[4, 2]=1
A[4, 4]=1
A[5, 2]=1
A[5, 5]=1
A[6, 4]=1
A[6, 5]=1
A[7, 4]=1
The adjacency matrix is then:
pprint(A)
array([[1, 1, 0, 0, 0, 0], [1, 1, 0, 1, 0, 0], [0, 0, 1, 0, 1, 0], [1, 0, 0, 1, 0, 0], [0, 0, 1, 0, 1, 0], [0, 0, 1, 0, 0, 1], [0, 0, 0, 0, 1, 1], [0, 0, 0, 0, 1, 0]])
A.shape
(8, 6)
And we can easily calculate the degree of each kind of nodes as well, by simply summing over the rows or columns:
kx = A.sum(axis=1)
ky = A.sum(axis=0)
kx
array([2, 3, 2, 2, 2, 2, 2, 1])
ky
array([3, 2, 3, 2, 4, 2])
The X and Y one-mode projections are:
X = np.dot(A, A.T)
Y = np.dot(A.T, A)
pprint(X)
array([[2, 2, 0, 1, 0, 0, 0, 0], [2, 3, 0, 2, 0, 0, 0, 0], [0, 0, 2, 0, 2, 1, 1, 1], [1, 2, 0, 2, 0, 0, 0, 0], [0, 0, 2, 0, 2, 1, 1, 1], [0, 0, 1, 0, 1, 2, 1, 0], [0, 0, 1, 0, 1, 1, 2, 1], [0, 0, 1, 0, 1, 0, 1, 1]])
pprint(Y)
array([[3, 2, 0, 2, 0, 0], [2, 2, 0, 1, 0, 0], [0, 0, 3, 0, 2, 1], [2, 1, 0, 2, 0, 0], [0, 0, 2, 0, 4, 1], [0, 0, 1, 0, 1, 2]])
And we can see that the y-projection neatly splits into two disconnected graphs, as expected
order = [0, 1, 3, 2, 4, 5]
Y[order, :][:, order]
array([[3, 2, 2, 0, 0, 0], [2, 2, 1, 0, 0, 0], [2, 1, 2, 0, 0, 0], [0, 0, 0, 3, 2, 1], [0, 0, 0, 2, 4, 1], [0, 0, 0, 1, 1, 2]])
Let us definie the similarity between two users (X) or items (Y) to simply be the fraction of edges user x shares with user y. For convenience, we supply the one-mode X projection directly
def similarity(X, kx):
S = X.copy().astype('float')
for i in range(X.shape[0]):
for j in range(X.shape[1]):
S[i, j]/= np.min([kx[i], kx[j]])
return S
Our similarity is then:
S = similarity(X, kx)
print(S)
[[1. 1. 0. 0.5 0. 0. 0. 0. ] [1. 1. 0. 1. 0. 0. 0. 0. ] [0. 0. 1. 0. 1. 0.5 0.5 1. ] [0.5 1. 0. 1. 0. 0. 0. 0. ] [0. 0. 1. 0. 1. 0.5 0.5 1. ] [0. 0. 0.5 0. 0.5 1. 0.5 0. ] [0. 0. 0.5 0. 0.5 0.5 1. 1. ] [0. 0. 1. 0. 1. 0. 1. 1. ]]
Naturally, this symmilarity metric is symmetric
(S-S.T).mean()
0.0
Now we can predict scores for all user/item pairs. The score for each user-item pair will be the average similarity of all users that have
def predict_score(A, S):
v = np.dot(S, A)
norms = S.sum(axis=0)-np.diag(S)
v = v/norms.reshape(-1,1)
return v
The predicted scores are then:
v = predict_score(A, S)
print(v.round(2))
[[1.67 1.33 0. 1. 0. 0. ] [1.5 1. 0. 1. 0. 0. ] [0. 0. 0.83 0. 1.17 0.33] [1.67 1. 0. 1.33 0. 0. ] [0. 0. 0.83 0. 1.17 0.33] [0. 0. 1.33 0. 1. 1. ] [0. 0. 0.6 0. 1.2 0.6 ] [0. 0. 0.67 0. 1.33 0.33]]
As we can see, we not only have scores for the items that each user already rated, but also for other items as well:
np.transpose(np.nonzero(np.sign(v)-A)) # Get the coordinates of the non-zero elements
array([[0, 3], [2, 5], [3, 1], [4, 5], [5, 4], [6, 2], [7, 2], [7, 5]])
From this matrix we would know to recommend y4 to x1, y6 to x3, etc