Matrix Decomposition Example for Machine Learning for Computational Linguistics¶

Download: This and various other Jupyter notebooks are available from my GitHub repo.

Version: 1.3, January 2024

License: Creative Commons Attribution-ShareAlike 4.0 International License (CA BY-SA 4.0)

Prerequisites:

In [ ]:

!pip install -U numpy

This is a tutorial related to the discussion of matrix decomposition of feature sets in classification tasksin the textbook Machine Learning: The Art and Science of Algorithms that Make Sense of Data by Peter Flach.

This tutorial was developed as part of my course material for the course Machine Learning for Computational Linguistics in the Computational Linguistics Program of the Department of Linguistics at Indiana University.

Matrix operations¶

In [1]:

from numpy import array

ratings = array([
        [1, 0, 1, 0],
        [0, 2, 2, 2],
        [0, 0, 0, 1],
        [1, 2, 3, 2],
        [1, 0, 1, 1],
        [0, 2, 2, 3]])

print(ratings)

[[1 0 1 0]
 [0 2 2 2]
 [0 0 0 1]
 [1 2 3 2]
 [1 0 1 1]
 [0 2 2 3]]

Decomposition of the matrix into sub-matrices:

In [2]:

from numpy import dot

filmsGenres = array([
        [1, 0, 1, 0],
        [0, 1, 1, 1],
        [0, 0, 0, 1]
    ])

preferencesGenres = array([
        [1, 0, 0],
        [0, 1, 0],
        [0, 0, 1],
        [1, 1, 0],
        [1, 0, 1],
        [0, 1, 1]
    ])

importanceGenres = array([
        [1, 0, 0],
        [0, 2, 0],
        [0, 0, 1]
    ])

print(dot(preferencesGenres, dot(importanceGenres, filmsGenres)))

[[1 0 1 0]
 [0 2 2 2]
 [0 0 0 1]
 [1 2 3 2]
 [1 0 1 1]
 [0 2 2 3]]