使用Graphlab实现电影推荐的隐语义模型¶

（Latent Factor Model， LFM）¶

In [9]:

import graphlab
graphlab.canvas.set_target("ipynb")
rating_sf = graphlab.SFrame('ratings')
users = graphlab.SFrame('users')
items = graphlab.SFrame('items')

In [3]:

rating_sf

Out[3]:

user_id	movie_id	rating	timestamp
1	1193	5	978300760
1	661	3	978302109
1	914	3	978301968
1	3408	4	978300275
1	2355	5	978824291
1	1197	3	978302268
1	1287	5	978302039
1	2804	5	978300719
1	594	4	978302268
1	919	4	978301368

[1000209 rows x 4 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

In [10]:

users

Out[10]:

user_id	gender	age	occupation	zip-code
1	F	1	10	48067
2	M	56	16	70072
3	M	25	15	55117
4	M	45	7	02460
5	M	25	20	55455
6	F	50	9	55117
7	M	35	1	06810
8	M	25	12	11413
9	M	25	17	61614
10	F	35	1	95370

[6040 rows x 5 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

In [11]:

items

Out[11]:

movie_id	title	genre
1	Toy Story (1995)	Animation\|Children's\|Come dy ...
2	Jumanji (1995)	Adventure\|Children's\|Fant asy ...
3	Grumpier Old Men (1995)	Comedy\|Romance
4	Waiting to Exhale (1995)	Comedy\|Drama
5	Father of the Bride Part II (1995) ...	Comedy
6	Heat (1995)	Action\|Crime\|Thriller
7	Sabrina (1995)	Comedy\|Romance
8	Tom and Huck (1995)	Adventure\|Children's
9	Sudden Death (1995)	Action
10	GoldenEye (1995)	Action\|Adventure\|Thriller

[3883 rows x 3 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

In [4]:

dir(graphlab.recommender)

Out[4]:

['__all__',
 '__builtins__',
 '__doc__',
 '__file__',
 '__name__',
 '__package__',
 '__path__',
 'create',
 'factorization_recommender',
 'item_content_recommender',
 'item_similarity_recommender',
 'popularity_recommender',
 'ranking_factorization_recommender',
 'util']

In [16]:

item_data = graphlab.SFrame({"my_item_id" : range(4),
                                 "data_1" : [ [1, 0], [1, 0], [0, 1], [0.5, 0.5] ],
                                 "data_2" : [ [0, 1], [1, 0], [0, 1], [0.5, 0.5] ] })
item_data

Out[16]:

data_1	data_2	my_item_id
[1.0, 0.0]	[0.0, 1.0]	0
[1.0, 0.0]	[1.0, 0.0]	1
[0.0, 1.0]	[0.0, 1.0]	2
[0.5, 0.5]	[0.5, 0.5]	3

[4 rows x 3 columns]

In [18]:

m = graphlab.recommender.item_content_recommender.create(item_data, "my_item_id")
m.recommend_from_interactions([0])

WARNING: The ItemContentRecommender model is still in beta.
WARNING: This feature transformer is still in beta, and some interpretation rules may change in the future.
('Applying transform:\n', Class             : AutoVectorizer

Model Fields
------------
Features          : ['data_1', 'data_2']
Excluded Features : ['my_item_id']

Column  Type   Interpretation  Transforms  Output Type
------  -----  --------------  ----------  -----------
data_1  array  vector          None        array      
data_2  array  vector          None        array      

)

Recsys training: model = item_content_recommender

Defaulting to brute force instead of ball tree because there are multiple distance components.

Starting brute force nearest neighbors model training.

Starting pairwise querying.

+--------------+---------+-------------+--------------+

| Query points | # Pairs | % Complete. | Elapsed Time |

+--------------+---------+-------------+--------------+

| 1            | 4       | 25          | 270us        |

| Done         |         | 100         | 345us        |

+--------------+---------+-------------+--------------+

Preparing data set.

    Data has 0 observations with 0 users and 4 items.

    Data prepared in: 0.013882s

Loading user-provided nearest items.

Generating candidate set for working with new users.

Finished training in 0.000942s

Out[18]:

my_item_id	score	rank
3	0.707106769085	1
1	0.5	2
2	0.5	3

[3 rows x 3 columns]

In [19]:

train, test = graphlab.recommender.util.random_split_by_user(rating_sf, 
                                                               'user_id', 'movie_id')

In [20]:

train[train['rating'] > 4]

Out[20]:

user_id	movie_id	rating	timestamp
1	1193	5	978300760
1	2355	5	978824291
1	1287	5	978302039
1	2804	5	978300719
1	595	5	978824268
1	1035	5	978301753
1	3105	5	978301713
1	1270	5	978300055
1	527	5	978824195
1	48	5	978824351

[? rows x 4 columns]
Note: Only the head of the SFrame is printed. This SFrame is lazily evaluated.
You can use sf.materialize() to force materialization.

In [21]:

from graphlab import item_similarity_recommender
itemcf = item_similarity_recommender.create(
    train[train['rating'] > 4], 'user_id', 'movie_id')

Recsys training: model = item_similarity

Warning: Ignoring columns rating, timestamp;

    To use one of these as a target column, set target =

    and use a method that allows the use of a target.

Preparing data set.

    Data has 218844 observations with 6011 users and 3228 items.

    Data prepared in: 0.197671s

Training model from provided data.

Gathering per-item and per-user statistics.

+--------------------------------+------------+

| Elapsed Time (Item Statistics) | % Complete |

+--------------------------------+------------+

| 801us                          | 49.75      |

| 4.16ms                         | 100        |

+--------------------------------+------------+

Setting up lookup tables.

Processing data in one pass using dense lookup tables.

+-------------------------------------+------------------+-----------------+

| Elapsed Time (Constructing Lookups) | Total % Complete | Items Processed |

+-------------------------------------+------------------+-----------------+

| 26.554ms                            | 0                | 0               |

| 116.515ms                           | 100              | 3228            |

+-------------------------------------+------------------+-----------------+

Finalizing lookup tables.

Generating candidate set for working with new users.

Finished training in 0.153497s

In [22]:

pop = graphlab.popularity_recommender.create(
    train[train['rating'] > 4], 'user_id', 'movie_id')

Recsys training: model = popularity

Warning: Ignoring columns rating, timestamp;

    To use one of these as a target column, set target =

    and use a method that allows the use of a target.

Preparing data set.

    Data has 218844 observations with 6011 users and 3228 items.

    Data prepared in: 0.21765s

218844 observations to process; with 3228 unique items.

In [23]:

m = graphlab.recommender.create(
    train, 'user_id', 'movie_id', 'rating')

Recsys training: model = ranking_factorization_recommender

Preparing data set.

    Data has 967831 observations with 6040 users and 3702 items.

    Data prepared in: 0.95541s

Training ranking_factorization_recommender for recommendations.

+--------------------------------+--------------------------------------------------+----------+

| Parameter                      | Description                                      | Value    |

+--------------------------------+--------------------------------------------------+----------+

| num_factors                    | Factor Dimension                                 | 32       |

| regularization                 | L2 Regularization on Factors                     | 1e-09    |

| solver                         | Solver used for training                         | adagrad  |

| linear_regularization          | L2 Regularization on Linear Coefficients         | 1e-09    |

| ranking_regularization         | Rank-based Regularization Weight                 | 0.25     |

| max_iterations                 | Maximum Number of Iterations                     | 25       |

+--------------------------------+--------------------------------------------------+----------+

  Optimizing model using SGD; tuning step size.

  Using 120978 / 967831 points for tuning the step size.

+---------+-------------------+------------------------------------------+

| Attempt | Initial Step Size | Estimated Objective Value                |

+---------+-------------------+------------------------------------------+

| 0       | 16.6667           | Not Viable                               |

| 1       | 4.16667           | Not Viable                               |

| 2       | 1.04167           | Not Viable                               |

| 3       | 0.260417          | 1.67058                                  |

| 4       | 0.130208          | 1.79647                                  |

| 5       | 0.0651042         | 1.99236                                  |

| 6       | 0.0325521         | 1.92629                                  |

+---------+-------------------+------------------------------------------+

| Final   | 0.260417          | 1.67058                                  |

+---------+-------------------+------------------------------------------+

Starting Optimization.

+---------+--------------+-------------------+-----------------------+-------------+

| Iter.   | Elapsed Time | Approx. Objective | Approx. Training RMSE | Step Size   |

+---------+--------------+-------------------+-----------------------+-------------+

| Initial | 116us        | 2.44604           | 1.11694               |             |

+---------+--------------+-------------------+-----------------------+-------------+

| 1       | 1.49s        | DIVERGED          | DIVERGED              | 0.260417    |

| RESET   | 1.96s        | 2.44602           | 1.11693               |             |

| 1       | 3.31s        | DIVERGED          | DIVERGED              | 0.130208    |

| RESET   | 3.86s        | 2.44605           | 1.11694               |             |

| 1       | 4.86s        | 2.04087           | 1.12074               | 0.0651042   |

| 2       | 5.88s        | 1.76673           | 1.02765               | 0.0651042   |

| 3       | 7.11s        | 1.6966            | 1.00407               | 0.0651042   |

| 4       | 8.14s        | 1.66303           | 0.99589               | 0.0651042   |

| 5       | 9.09s        | 1.63916           | 0.989335              | 0.0651042   |

| 6       | 10.25s       | 1.61909           | 0.983417              | 0.0651042   |

| 7       | 11.32s       | 1.60004           | 0.978229              | 0.0651042   |

| 8       | 12.54s       | 1.5846            | 0.973418              | 0.0651042   |

| 9       | 13.78s       | 1.57167           | 0.970329              | 0.0651042   |

| 10      | 14.95s       | 1.55995           | 0.966602              | 0.0651042   |

| 11      | 16.13s       | 1.54959           | 0.963766              | 0.0651042   |

| 12      | 17.46s       | 1.5412            | 0.961447              | 0.0651042   |

| 13      | 18.65s       | 1.53128           | 0.959077              | 0.0651042   |

| 14      | 19.67s       | 1.52404           | 0.957169              | 0.0651042   |

| 15      | 20.71s       | 1.51732           | 0.955351              | 0.0651042   |

| 16      | 21.72s       | 1.51131           | 0.953791              | 0.0651042   |

| 17      | 22.78s       | 1.50507           | 0.952055              | 0.0651042   |

| 18      | 23.81s       | 1.49942           | 0.950269              | 0.0651042   |

| 19      | 25.14s       | 1.4943            | 0.949683              | 0.0651042   |

| 20      | 26.37s       | 1.49061           | 0.948171              | 0.0651042   |

| 21      | 27.54s       | 1.48516           | 0.947189              | 0.0651042   |

| 22      | 28.66s       | 1.48046           | 0.945821              | 0.0651042   |

| 23      | 29.69s       | 1.47685           | 0.944816              | 0.0651042   |

| 24      | 30.72s       | 1.47361           | 0.944047              | 0.0651042   |

| 25      | 31.74s       | 1.46908           | 0.943049              | 0.0651042   |

+---------+--------------+-------------------+-----------------------+-------------+

Optimization Complete: Maximum number of passes through the data reached.

Computing final objective value and training RMSE.

       Final objective value: 1.50048

       Final training RMSE: 0.937071

In [13]:

Out[13]:

Class                           : RankingFactorizationRecommender

Schema
------
User ID                         : user_id
Item ID                         : movie_id
Target                          : rating
Additional observation features : 1
Number of user side features    : 0
Number of item side features    : 0

Statistics
----------
Number of observations          : 965508
Number of users                 : 6040
Number of items                 : 3706

Training summary
----------------
Training time                   : 36.9965

Model Parameters
----------------
Model class                     : RankingFactorizationRecommender
num_factors                     : 32
binary_target                   : 0
side_data_factorization         : 1
solver                          : auto
nmf                             : 0
max_iterations                  : 25

Regularization Settings
-----------------------
regularization                  : 0.0
regularization_type             : normal
linear_regularization           : 0.0
ranking_regularization          : 0.25
unobserved_rating_value         : -1.79769313486e+308
num_sampled_negative_examples   : 4
ials_confidence_scaling_type    : auto
ials_confidence_scaling_factor  : 1

Optimization Settings
---------------------
init_random_sigma               : 0.01
sgd_convergence_interval        : 4
sgd_convergence_threshold       : 0.0
sgd_max_trial_iterations        : 5
sgd_sampling_block_size         : 131072
sgd_step_adjustment_interval    : 4
sgd_step_size                   : 0.0
sgd_trial_sample_minimum_size   : 10000
sgd_trial_sample_proportion     : 0.125
step_size_decrease_rate         : 0.75
additional_iterations_if_unhealthy: 5
adagrad_momentum_weighting      : 0.9
num_tempering_iterations        : 4
tempering_regularization_start_value: 0.0
track_exact_loss                : 0

In [14]:

m['coefficients']

Out[14]:

{'intercept': 3.5821495005738013, 'movie_id': Columns:
 	movie_id	int
 	linear_terms	float
 	factors	array
 
 Rows: 3706
 
 Data:
 +----------+------------------+-------------------------------+
 | movie_id |   linear_terms   |            factors            |
 +----------+------------------+-------------------------------+
 |   1193   |  1.06781125069   | [-0.119829073548, -0.02245... |
 |   661    | -0.0261590108275 | [-0.727257788181, 0.016146... |
 |   914    |  0.324085891247  | [-0.859803378582, 0.056376... |
 |   3408   |  0.565778970718  | [0.334619760513, -0.014206... |
 |   2355   |  0.648248255253  | [-0.248598009348, 0.103843... |
 |   1197   |  1.12024652958   | [-0.100379563868, 0.085359... |
 |   1287   |  0.345532894135  | [-0.247123196721, 0.024613... |
 |   2804   |  0.894821941853  | [-0.272583067417, 0.046351... |
 |   594    |  0.311594575644  | [-0.974369823933, 0.054282... |
 |   919    |  0.97704321146   | [-0.598346889019, 0.085630... |
 +----------+------------------+-------------------------------+
 [3706 rows x 3 columns]
 Note: Only the head of the SFrame is printed.
 You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns., 'side_data': Columns:
 	feature	str
 	index	str
 	linear_terms	float
 	factors	array
 
 Rows: 1
 
 Data:
 +-----------+-------+-----------------+-------------------------------+
 |  feature  | index |   linear_terms  |            factors            |
 +-----------+-------+-----------------+-------------------------------+
 | timestamp |   0   | -0.116745471954 | [-0.564183712006, 1.267165... |
 +-----------+-------+-----------------+-------------------------------+
 [1 rows x 4 columns], 'user_id': Columns:
 	user_id	int
 	linear_terms	float
 	factors	array
 
 Rows: 6040
 
 Data:
 +---------+------------------+-------------------------------+
 | user_id |   linear_terms   |            factors            |
 +---------+------------------+-------------------------------+
 |    1    | -0.027785371989  | [-0.0942558199167, 0.00739... |
 |    2    | -0.0234720371664 | [0.015922004357, -0.033992... |
 |    3    | -0.0345229320228 | [0.176564618945, -0.050576... |
 |    4    | -0.0198582224548 | [-0.0773911848664, -0.0500... |
 |    5    | -0.0562275871634 | [-0.0598151274025, -0.0059... |
 |    6    | -0.0401206016541 | [0.0565584115684, 0.030123... |
 |    7    | -0.0433877147734 | [0.205288589001, -0.060017... |
 |    8    | -0.0184100158513 | [0.169030055404, -0.043373... |
 |    9    | -0.0512112490833 | [0.163330376148, -0.060946... |
 |    10   | -0.0407416447997 | [-0.420519113541, 0.110337... |
 +---------+------------------+-------------------------------+
 [6040 rows x 3 columns]
 Note: Only the head of the SFrame is printed.
 You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.}

In [25]:

graphlab.recommender.util.compare_models(test[test['rating'] > 4], 
                                    [pop, itemcf, m], 
                                    user_sample=0.5, 
                                    metric='precision_recall')

compare_models: using 466 users to estimate model performance
PROGRESS: Evaluate model M0

Precision and recall summary statistics by cutoff
+--------+-----------------+-----------------+
| cutoff |  mean_precision |   mean_recall   |
+--------+-----------------+-----------------+
|   1    |  0.111587982833 | 0.0210169663883 |
|   2    |  0.105150214592 | 0.0425562055746 |
|   3    |  0.102288984263 | 0.0544524525898 |
|   4    | 0.0992489270386 | 0.0720982969229 |
|   5    | 0.0909871244635 | 0.0813264276131 |
|   6    | 0.0894134477825 | 0.0945639457791 |
|   7    | 0.0843041079093 |  0.100918470675 |
|   8    | 0.0815450643777 |  0.109968226951 |
|   9    | 0.0810681926562 |  0.126194673165 |
|   10   | 0.0791845493562 |  0.136877932933 |
+--------+-----------------+-----------------+
[10 rows x 3 columns]

PROGRESS: Evaluate model M1

Precision and recall summary statistics by cutoff
+--------+----------------+-----------------+
| cutoff | mean_precision |   mean_recall   |
+--------+----------------+-----------------+
|   1    | 0.214592274678 | 0.0357070732147 |
|   2    | 0.188841201717 | 0.0624344732159 |
|   3    | 0.171673819742 | 0.0839114919565 |
|   4    | 0.157725321888 |  0.102658606754 |
|   5    | 0.148497854077 |  0.119670145788 |
|   6    | 0.141273247496 |  0.134922725517 |
|   7    |  0.1330472103  |  0.14491465424  |
|   8    | 0.12660944206  |  0.161011620372 |
|   9    | 0.120886981402 |  0.175966528519 |
|   10   | 0.113733905579 |  0.182583941999 |
+--------+----------------+-----------------+
[10 rows x 3 columns]

PROGRESS: Evaluate model M2

Precision and recall summary statistics by cutoff
+--------+----------------+-----------------+
| cutoff | mean_precision |   mean_recall   |
+--------+----------------+-----------------+
|   1    | 0.255364806867 | 0.0440953445436 |
|   2    | 0.224248927039 | 0.0737550631862 |
|   3    | 0.206008583691 | 0.0966051323736 |
|   4    | 0.188841201717 |  0.114173476478 |
|   5    | 0.178540772532 |  0.135443733289 |
|   6    | 0.168097281831 |  0.15412416307  |
|   7    | 0.161863887186 |  0.172695193849 |
|   8    | 0.152628755365 |  0.183265481294 |
|   9    | 0.145207439199 |  0.195052041281 |
|   10   | 0.139270386266 |  0.207348413937 |
+--------+----------------+-----------------+
[10 rows x 3 columns]

Out[25]:

[{'precision_recall_by_user': Columns:
  	user_id	int
  	cutoff	int
  	precision	float
  	recall	float
  	count	int
  
  Rows: 8388
  
  Data:
  +---------+--------+----------------+----------------+-------+
  | user_id | cutoff |   precision    |     recall     | count |
  +---------+--------+----------------+----------------+-------+
  |    11   |   1    |      0.0       |      0.0       |   6   |
  |    11   |   2    |      0.0       |      0.0       |   6   |
  |    11   |   3    |      0.0       |      0.0       |   6   |
  |    11   |   4    |      0.0       |      0.0       |   6   |
  |    11   |   5    |      0.0       |      0.0       |   6   |
  |    11   |   6    |      0.0       |      0.0       |   6   |
  |    11   |   7    |      0.0       |      0.0       |   6   |
  |    11   |   8    |     0.125      | 0.166666666667 |   6   |
  |    11   |   9    | 0.111111111111 | 0.166666666667 |   6   |
  |    11   |   10   |      0.1       | 0.166666666667 |   6   |
  +---------+--------+----------------+----------------+-------+
  [8388 rows x 5 columns]
  Note: Only the head of the SFrame is printed.
  You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.,
  'precision_recall_overall': Columns:
  	cutoff	int
  	precision	float
  	recall	float
  
  Rows: 18
  
  Data:
  +--------+-----------------+-----------------+
  | cutoff |    precision    |      recall     |
  +--------+-----------------+-----------------+
  |   1    |  0.111587982833 | 0.0210169663883 |
  |   2    |  0.105150214592 | 0.0425562055746 |
  |   3    |  0.102288984263 | 0.0544524525898 |
  |   4    | 0.0992489270386 | 0.0720982969229 |
  |   5    | 0.0909871244635 | 0.0813264276131 |
  |   6    | 0.0894134477825 | 0.0945639457791 |
  |   7    | 0.0843041079093 |  0.100918470675 |
  |   8    | 0.0815450643777 |  0.109968226951 |
  |   9    | 0.0810681926562 |  0.126194673165 |
  |   10   | 0.0791845493562 |  0.136877932933 |
  +--------+-----------------+-----------------+
  [18 rows x 3 columns]
  Note: Only the head of the SFrame is printed.
  You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.},
 {'precision_recall_by_user': Columns:
  	user_id	int
  	cutoff	int
  	precision	float
  	recall	float
  	count	int
  
  Rows: 8388
  
  Data:
  +---------+--------+----------------+----------------+-------+
  | user_id | cutoff |   precision    |     recall     | count |
  +---------+--------+----------------+----------------+-------+
  |    11   |   1    |      0.0       |      0.0       |   6   |
  |    11   |   2    |      0.5       | 0.166666666667 |   6   |
  |    11   |   3    | 0.333333333333 | 0.166666666667 |   6   |
  |    11   |   4    |      0.5       | 0.333333333333 |   6   |
  |    11   |   5    |      0.4       | 0.333333333333 |   6   |
  |    11   |   6    | 0.333333333333 | 0.333333333333 |   6   |
  |    11   |   7    | 0.285714285714 | 0.333333333333 |   6   |
  |    11   |   8    |      0.25      | 0.333333333333 |   6   |
  |    11   |   9    | 0.222222222222 | 0.333333333333 |   6   |
  |    11   |   10   |      0.2       | 0.333333333333 |   6   |
  +---------+--------+----------------+----------------+-------+
  [8388 rows x 5 columns]
  Note: Only the head of the SFrame is printed.
  You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.,
  'precision_recall_overall': Columns:
  	cutoff	int
  	precision	float
  	recall	float
  
  Rows: 18
  
  Data:
  +--------+----------------+-----------------+
  | cutoff |   precision    |      recall     |
  +--------+----------------+-----------------+
  |   1    | 0.214592274678 | 0.0357070732147 |
  |   2    | 0.188841201717 | 0.0624344732159 |
  |   3    | 0.171673819742 | 0.0839114919565 |
  |   4    | 0.157725321888 |  0.102658606754 |
  |   5    | 0.148497854077 |  0.119670145788 |
  |   6    | 0.141273247496 |  0.134922725517 |
  |   7    |  0.1330472103  |  0.14491465424  |
  |   8    | 0.12660944206  |  0.161011620372 |
  |   9    | 0.120886981402 |  0.175966528519 |
  |   10   | 0.113733905579 |  0.182583941999 |
  +--------+----------------+-----------------+
  [18 rows x 3 columns]
  Note: Only the head of the SFrame is printed.
  You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.},
 {'precision_recall_by_user': Columns:
  	user_id	int
  	cutoff	int
  	precision	float
  	recall	float
  	count	int
  
  Rows: 8388
  
  Data:
  +---------+--------+----------------+----------------+-------+
  | user_id | cutoff |   precision    |     recall     | count |
  +---------+--------+----------------+----------------+-------+
  |    11   |   1    |      0.0       |      0.0       |   6   |
  |    11   |   2    |      0.5       | 0.166666666667 |   6   |
  |    11   |   3    | 0.333333333333 | 0.166666666667 |   6   |
  |    11   |   4    |      0.25      | 0.166666666667 |   6   |
  |    11   |   5    |      0.2       | 0.166666666667 |   6   |
  |    11   |   6    | 0.333333333333 | 0.333333333333 |   6   |
  |    11   |   7    | 0.285714285714 | 0.333333333333 |   6   |
  |    11   |   8    |      0.25      | 0.333333333333 |   6   |
  |    11   |   9    | 0.222222222222 | 0.333333333333 |   6   |
  |    11   |   10   |      0.2       | 0.333333333333 |   6   |
  +---------+--------+----------------+----------------+-------+
  [8388 rows x 5 columns]
  Note: Only the head of the SFrame is printed.
  You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.,
  'precision_recall_overall': Columns:
  	cutoff	int
  	precision	float
  	recall	float
  
  Rows: 18
  
  Data:
  +--------+----------------+-----------------+
  | cutoff |   precision    |      recall     |
  +--------+----------------+-----------------+
  |   1    | 0.255364806867 | 0.0440953445436 |
  |   2    | 0.224248927039 | 0.0737550631862 |
  |   3    | 0.206008583691 | 0.0966051323736 |
  |   4    | 0.188841201717 |  0.114173476478 |
  |   5    | 0.178540772532 |  0.135443733289 |
  |   6    | 0.168097281831 |  0.15412416307  |
  |   7    | 0.161863887186 |  0.172695193849 |
  |   8    | 0.152628755365 |  0.183265481294 |
  |   9    | 0.145207439199 |  0.195052041281 |
  |   10   | 0.139270386266 |  0.207348413937 |
  +--------+----------------+-----------------+
  [18 rows x 3 columns]
  Note: Only the head of the SFrame is printed.
  You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.}]

Optimizing for ranking¶

In [26]:

m_rank = graphlab.recommender.ranking_factorization_recommender.create(
    train, 'user_id', 'movie_id', 'rating', 
    unobserved_rating_value=3)

Recsys training: model = ranking_factorization_recommender

Preparing data set.

    Data has 967831 observations with 6040 users and 3702 items.

    Data prepared in: 0.983562s

Training ranking_factorization_recommender for recommendations.

+--------------------------------+--------------------------------------------------+----------+

| Parameter                      | Description                                      | Value    |

+--------------------------------+--------------------------------------------------+----------+

| num_factors                    | Factor Dimension                                 | 32       |

| regularization                 | L2 Regularization on Factors                     | 1e-09    |

| solver                         | Solver used for training                         | adagrad  |

| linear_regularization          | L2 Regularization on Linear Coefficients         | 1e-09    |

| ranking_regularization         | Rank-based Regularization Weight                 | 0.25     |

| unobserved_rating_value        | Ranking Target Rating for Unobserved Interacti...| 3        |

| max_iterations                 | Maximum Number of Iterations                     | 25       |

+--------------------------------+--------------------------------------------------+----------+

  Optimizing model using SGD; tuning step size.

  Using 120978 / 967831 points for tuning the step size.

+---------+-------------------+------------------------------------------+

| Attempt | Initial Step Size | Estimated Objective Value                |

+---------+-------------------+------------------------------------------+

| 0       | 16.6667           | Not Viable                               |

| 1       | 4.16667           | Not Viable                               |

| 2       | 1.04167           | Not Viable                               |

| 3       | 0.260417          | 0.521984                                 |

| 4       | 0.130208          | 0.573774                                 |

| 5       | 0.0651042         | 0.990046                                 |

| 6       | 0.0325521         | 0.945016                                 |

+---------+-------------------+------------------------------------------+

| Final   | 0.260417          | 0.521984                                 |

+---------+-------------------+------------------------------------------+

Starting Optimization.

+---------+--------------+-------------------+-----------------------+-------------+

| Iter.   | Elapsed Time | Approx. Objective | Approx. Training RMSE | Step Size   |

+---------+--------------+-------------------+-----------------------+-------------+

| Initial | 110us        | 1.33206           | 1.11694               |             |

+---------+--------------+-------------------+-----------------------+-------------+

| 1       | 1.02s        | 1.40375           | 1.09634               | 0.260417    |

| 2       | 1.93s        | 0.936617          | 0.900854              | 0.260417    |

| 3       | 2.91s        | 0.824838          | 0.846264              | 0.260417    |

| 4       | 3.87s        | 0.810082          | 0.837036              | 0.260417    |

| 5       | 4.77s        | 0.77387           | 0.817288              | 0.260417    |

| 6       | 5.58s        | 0.75637           | 0.807332              | 0.260417    |

| 10      | 9.20s        | 0.697997          | 0.774232              | 0.260417    |

| 11      | 10.03s       | 0.680322          | 0.764175              | 0.260417    |

| 15      | 13.62s       | 0.659587          | 0.751889              | 0.260417    |

| 20      | 18.44s       | 0.640783          | 0.740592              | 0.260417    |

| 25      | 23.24s       | 0.624992          | 0.731253              | 0.260417    |

+---------+--------------+-------------------+-----------------------+-------------+

Optimization Complete: Maximum number of passes through the data reached.

Computing final objective value and training RMSE.

       Final objective value: 0.628853

       Final training RMSE: 0.723727

In [27]:

results = graphlab.recommender.util.compare_models(
    test[test['rating'] > 4], 
    [pop, itemcf, m, m_rank], 
    user_sample=0.5, 
    metric='precision_recall')

compare_models: using 466 users to estimate model performance
PROGRESS: Evaluate model M0

Precision and recall summary statistics by cutoff
+--------+-----------------+-----------------+
| cutoff |  mean_precision |   mean_recall   |
+--------+-----------------+-----------------+
|   1    |  0.087982832618 | 0.0140248361318 |
|   2    | 0.0965665236052 | 0.0292522655743 |
|   3    | 0.0937052932761 | 0.0365127169855 |
|   4    | 0.0895922746781 |  0.046837188255 |
|   5    | 0.0862660944206 | 0.0596693591506 |
|   6    | 0.0808297567954 | 0.0685112959099 |
|   7    | 0.0787860208461 | 0.0764441666663 |
|   8    | 0.0769849785408 | 0.0846504186142 |
|   9    | 0.0762994754411 | 0.0972750087562 |
|   10   | 0.0744635193133 |  0.107658443171 |
+--------+-----------------+-----------------+
[10 rows x 3 columns]

PROGRESS: Evaluate model M1

Precision and recall summary statistics by cutoff
+--------+----------------+-----------------+
| cutoff | mean_precision |   mean_recall   |
+--------+----------------+-----------------+
|   1    | 0.206008583691 | 0.0348964593846 |
|   2    | 0.192060085837 | 0.0615229961222 |
|   3    | 0.176680972818 | 0.0815227915717 |
|   4    | 0.157188841202 | 0.0943852374705 |
|   5    | 0.148497854077 |  0.117658845156 |
|   6    | 0.138054363376 |  0.127392485241 |
|   7    | 0.133966891478 |  0.139672970091 |
|   8    | 0.12821888412  |  0.157576812504 |
|   9    | 0.12327134001  |  0.173337160645 |
|   10   | 0.115879828326 |  0.179741133016 |
+--------+----------------+-----------------+
[10 rows x 3 columns]

PROGRESS: Evaluate model M2

Precision and recall summary statistics by cutoff
+--------+----------------+-----------------+
| cutoff | mean_precision |   mean_recall   |
+--------+----------------+-----------------+
|   1    | 0.225321888412 | 0.0341165787993 |
|   2    | 0.203862660944 | 0.0592605021931 |
|   3    | 0.191702432046 | 0.0799255771478 |
|   4    | 0.17864806867  | 0.0976891531755 |
|   5    | 0.170386266094 |  0.117871623275 |
|   6    | 0.157010014306 |  0.134454543693 |
|   7    | 0.152667075414 |  0.149987723075 |
|   8    | 0.145922746781 |  0.160054923279 |
|   9    | 0.137577491655 |  0.16716187937  |
|   10   | 0.134120171674 |  0.183519021385 |
+--------+----------------+-----------------+
[10 rows x 3 columns]

PROGRESS: Evaluate model M3

Precision and recall summary statistics by cutoff
+--------+----------------+-----------------+
| cutoff | mean_precision |   mean_recall   |
+--------+----------------+-----------------+
|   1    | 0.145922746781 | 0.0155536026315 |
|   2    | 0.151287553648 |  0.032545498401 |
|   3    | 0.138769670959 | 0.0480767788701 |
|   4    | 0.128755364807 | 0.0602766563169 |
|   5    | 0.124034334764 | 0.0725574969378 |
|   6    | 0.122675250358 | 0.0837751352493 |
|   7    | 0.115879828326 | 0.0916484679064 |
|   8    | 0.113733905579 |  0.104413957145 |
|   9    | 0.111111111111 |  0.116317958073 |
|   10   | 0.107296137339 |  0.125368902557 |
+--------+----------------+-----------------+
[10 rows x 3 columns]

In [28]:

results[3]['precision_recall_overall']

Out[28]:

cutoff	precision	recall
1	0.145922746781	0.0155536026315
2	0.151287553648	0.032545498401
3	0.138769670959	0.0480767788701
4	0.128755364807	0.0602766563169
5	0.124034334764	0.0725574969378
6	0.122675250358	0.0837751352493
7	0.115879828326	0.0916484679064
8	0.113733905579	0.104413957145
9	0.111111111111	0.116317958073
10	0.107296137339	0.125368902557

[18 rows x 3 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

Experimenting with side features¶

In [29]:

user_sf = graphlab.SFrame('users')
item_sf = graphlab.SFrame('items')

In [30]:

user_sf

Out[30]:

user_id	gender	age	occupation	zip-code
1	F	1	10	48067
2	M	56	16	70072
3	M	25	15	55117
4	M	45	7	02460
5	M	25	20	55455
6	F	50	9	55117
7	M	35	1	06810
8	M	25	12	11413
9	M	25	17	61614
10	F	35	1	95370

[6040 rows x 5 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

In [31]:

item_sf

Out[31]:

movie_id	title	genre
1	Toy Story (1995)	Animation\|Children's\|Come dy ...
2	Jumanji (1995)	Adventure\|Children's\|Fant asy ...
3	Grumpier Old Men (1995)	Comedy\|Romance
4	Waiting to Exhale (1995)	Comedy\|Drama
5	Father of the Bride Part II (1995) ...	Comedy
6	Heat (1995)	Action\|Crime\|Thriller
7	Sabrina (1995)	Comedy\|Romance
8	Tom and Huck (1995)	Adventure\|Children's
9	Sudden Death (1995)	Action
10	GoldenEye (1995)	Action\|Adventure\|Thriller

[3883 rows x 3 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

In [32]:

m_user = graphlab.recommender.create(train, 'user_id', 'movie_id', 'rating', 
                                     user_data=user_sf)
m_item = graphlab.recommender.create(train, 'user_id', 'movie_id', 'rating', 
                                     item_data=item_sf)
m_both = graphlab.recommender.create(train, 'user_id', 'movie_id', 'rating', 
                                     user_data=user_sf, item_data=item_sf)

Recsys training: model = ranking_factorization_recommender

Preparing data set.

    Data has 967831 observations with 6040 users and 3702 items.

    Data prepared in: 0.937442s

Training ranking_factorization_recommender for recommendations.

+--------------------------------+--------------------------------------------------+----------+

| Parameter                      | Description                                      | Value    |

+--------------------------------+--------------------------------------------------+----------+

| num_factors                    | Factor Dimension                                 | 32       |

| regularization                 | L2 Regularization on Factors                     | 1e-09    |

| solver                         | Solver used for training                         | adagrad  |

| linear_regularization          | L2 Regularization on Linear Coefficients         | 1e-09    |

| ranking_regularization         | Rank-based Regularization Weight                 | 0.25     |

| side_data_factorization        | Assign Factors for Side Data                     | True     |

| max_iterations                 | Maximum Number of Iterations                     | 25       |

+--------------------------------+--------------------------------------------------+----------+

  Optimizing model using SGD; tuning step size.

  Using 120978 / 967831 points for tuning the step size.

+---------+-------------------+------------------------------------------+

| Attempt | Initial Step Size | Estimated Objective Value                |

+---------+-------------------+------------------------------------------+

| 0       | 7.14286           | Not Viable                               |

| 1       | 1.78571           | Not Viable                               |

| 2       | 0.446429          | 1.3375                                   |

| 3       | 0.223214          | 1.094                                    |

| 4       | 0.111607          | 1.12747                                  |

| 5       | 0.0558036         | 1.25133                                  |

| 6       | 0.0279018         | 1.80473                                  |

+---------+-------------------+------------------------------------------+

| Final   | 0.223214          | 1.094                                    |

+---------+-------------------+------------------------------------------+

Starting Optimization.

+---------+--------------+-------------------+-----------------------+-------------+

| Iter.   | Elapsed Time | Approx. Objective | Approx. Training RMSE | Step Size   |

+---------+--------------+-------------------+-----------------------+-------------+

| Initial | 69us         | 2.44586           | 1.11695               |             |

+---------+--------------+-------------------+-----------------------+-------------+

| 1       | 1.85s        | 3.2566            | 1.52186               | 0.223214    |

| 2       | 4.01s        | 1.82009           | 1.07645               | 0.223214    |

| 3       | 5.99s        | 1.52197           | 0.962525              | 0.223214    |

| 4       | 7.99s        | 1.46419           | 0.943667              | 0.223214    |

| 5       | 10.05s       | 1.42411           | 0.929737              | 0.223214    |

| 6       | 11.86s       | 1.4125            | 0.927836              | 0.223214    |

| 7       | 13.77s       | 1.45768           | 0.950511              | 0.223214    |

| 8       | 15.73s       | 1.41854           | 0.933561              | 0.223214    |

| 9       | 17.54s       | 1.3976            | 0.925889              | 0.223214    |

| 10      | 19.33s       | 1.39159           | 0.925112              | 0.223214    |

| 11      | 21.10s       | 1.37437           | 0.918518              | 0.223214    |

| 12      | 22.88s       | 1.36352           | 0.9152                | 0.223214    |

| 13      | 24.67s       | 1.35502           | 0.912162              | 0.223214    |

| 14      | 26.57s       | 1.35012           | 0.910909              | 0.223214    |

| 15      | 28.45s       | 1.3471            | 0.910619              | 0.223214    |

| 16      | 30.48s       | 1.33695           | 0.906491              | 0.223214    |

| 17      | 32.37s       | 1.32908           | 0.903518              | 0.223214    |

| 18      | 34.39s       | 1.3272            | 0.903303              | 0.223214    |

| 19      | 36.87s       | 1.32117           | 0.901367              | 0.223214    |

| 20      | 39.11s       | 1.32256           | 0.902126              | 0.223214    |

| 21      | 41.12s       | 1.31449           | 0.898852              | 0.223214    |

| 22      | 43.09s       | 1.30774           | 0.896466              | 0.223214    |

| 23      | 45.10s       | 1.30063           | 0.894055              | 0.223214    |

| 24      | 47.15s       | 1.29969           | 0.89392               | 0.223214    |

| 25      | 49.25s       | 1.2995            | 0.894295              | 0.223214    |

+---------+--------------+-------------------+-----------------------+-------------+

Optimization Complete: Maximum number of passes through the data reached.

Computing final objective value and training RMSE.

       Final objective value: 1.3239

       Final training RMSE: 0.879756

Recsys training: model = ranking_factorization_recommender

Preparing data set.

    Data has 967831 observations with 6040 users and 3883 items.

    Data prepared in: 1.00141s

Training ranking_factorization_recommender for recommendations.

+--------------------------------+--------------------------------------------------+----------+

| Parameter                      | Description                                      | Value    |

+--------------------------------+--------------------------------------------------+----------+

| num_factors                    | Factor Dimension                                 | 32       |

| regularization                 | L2 Regularization on Factors                     | 1e-09    |

| solver                         | Solver used for training                         | adagrad  |

| linear_regularization          | L2 Regularization on Linear Coefficients         | 1e-09    |

| ranking_regularization         | Rank-based Regularization Weight                 | 0.25     |

| side_data_factorization        | Assign Factors for Side Data                     | True     |

| max_iterations                 | Maximum Number of Iterations                     | 25       |

+--------------------------------+--------------------------------------------------+----------+

  Optimizing model using SGD; tuning step size.

  Using 120978 / 967831 points for tuning the step size.

+---------+-------------------+------------------------------------------+

| Attempt | Initial Step Size | Estimated Objective Value                |

+---------+-------------------+------------------------------------------+

| 0       | 10                | Not Viable                               |

| 1       | 2.5               | Not Viable                               |

| 2       | 0.625             | Not Viable                               |

| 3       | 0.15625           | 1.06978                                  |

| 4       | 0.078125          | 1.54845                                  |

| 5       | 0.0390625         | 1.81084                                  |

| 6       | 0.0195312         | 1.78482                                  |

+---------+-------------------+------------------------------------------+

| Final   | 0.15625           | 1.06978                                  |

+---------+-------------------+------------------------------------------+

Starting Optimization.

+---------+--------------+-------------------+-----------------------+-------------+

| Iter.   | Elapsed Time | Approx. Objective | Approx. Training RMSE | Step Size   |

+---------+--------------+-------------------+-----------------------+-------------+

| Initial | 97us         | 2.44627           | 1.11695               |             |

+---------+--------------+-------------------+-----------------------+-------------+

| 1       | 1.46s        | 1.85534           | 1.1103                | 0.15625     |

| 2       | 2.94s        | 1.45672           | 0.949586              | 0.15625     |

| 3       | 4.38s        | 1.35111           | 0.91098               | 0.15625     |

| 4       | 5.79s        | 1.32376           | 0.904016              | 0.15625     |

| 5       | 7.23s        | 1.299             | 0.895309              | 0.15625     |

| 6       | 8.78s        | 1.27552           | 0.887471              | 0.15625     |

| 7       | 10.21s       | 1.24705           | 0.875703              | 0.15625     |

| 8       | 11.58s       | 1.23298           | 0.870401              | 0.15625     |

| 9       | 12.96s       | 1.2188            | 0.865168              | 0.15625     |

| 10      | 14.36s       | 1.20627           | 0.860272              | 0.15625     |

| 11      | 15.76s       | 1.19564           | 0.856134              | 0.15625     |

| 12      | 17.13s       | 1.19297           | 0.855644              | 0.15625     |

| 13      | 18.50s       | 1.18891           | 0.854091              | 0.15625     |

| 14      | 19.88s       | 1.18114           | 0.850945              | 0.15625     |

| 15      | 21.42s       | 1.17493           | 0.848849              | 0.15625     |

| 16      | 22.85s       | 1.16891           | 0.846376              | 0.15625     |

| 17      | 24.21s       | 1.16579           | 0.844708              | 0.15625     |

| 18      | 25.58s       | 1.16022           | 0.842751              | 0.15625     |

| 19      | 26.95s       | 1.15579           | 0.840918              | 0.15625     |

| 20      | 28.33s       | 1.15072           | 0.838805              | 0.15625     |

| 21      | 29.72s       | 1.14833           | 0.83774               | 0.15625     |

| 22      | 31.04s       | 1.1448            | 0.83681               | 0.15625     |

| 23      | 32.40s       | 1.14035           | 0.834712              | 0.15625     |

| 24      | 33.83s       | 1.13886           | 0.834142              | 0.15625     |

| 25      | 35.16s       | 1.13705           | 0.833517              | 0.15625     |

+---------+--------------+-------------------+-----------------------+-------------+

Optimization Complete: Maximum number of passes through the data reached.

Computing final objective value and training RMSE.

       Final objective value: 1.15511

       Final training RMSE: 0.815472

Recsys training: model = ranking_factorization_recommender

Preparing data set.

    Data has 967831 observations with 6040 users and 3883 items.

    Data prepared in: 0.956958s

Training ranking_factorization_recommender for recommendations.

+--------------------------------+--------------------------------------------------+----------+

| Parameter                      | Description                                      | Value    |

+--------------------------------+--------------------------------------------------+----------+

| num_factors                    | Factor Dimension                                 | 32       |

| regularization                 | L2 Regularization on Factors                     | 1e-09    |

| solver                         | Solver used for training                         | adagrad  |

| linear_regularization          | L2 Regularization on Linear Coefficients         | 1e-09    |

| ranking_regularization         | Rank-based Regularization Weight                 | 0.25     |

| side_data_factorization        | Assign Factors for Side Data                     | True     |

| max_iterations                 | Maximum Number of Iterations                     | 25       |

+--------------------------------+--------------------------------------------------+----------+

  Optimizing model using SGD; tuning step size.

  Using 120978 / 967831 points for tuning the step size.

+---------+-------------------+------------------------------------------+

| Attempt | Initial Step Size | Estimated Objective Value                |

+---------+-------------------+------------------------------------------+

| 0       | 5.55556           | Not Viable                               |

| 1       | 1.38889           | Not Viable                               |

| 2       | 0.347222          | 1.31801                                  |

| 3       | 0.173611          | 1.05607                                  |

| 4       | 0.0868056         | 1.03567                                  |

| 5       | 0.0434028         | 1.19629                                  |

| 6       | 0.0217014         | 1.4938                                   |

| 7       | 0.0108507         | 1.70777                                  |

+---------+-------------------+------------------------------------------+

| Final   | 0.0868056         | 1.03567                                  |

+---------+-------------------+------------------------------------------+

Starting Optimization.

+---------+--------------+-------------------+-----------------------+-------------+

| Iter.   | Elapsed Time | Approx. Objective | Approx. Training RMSE | Step Size   |

+---------+--------------+-------------------+-----------------------+-------------+

| Initial | 69us         | 2.44654           | 1.11696               |             |

+---------+--------------+-------------------+-----------------------+-------------+

| 1       | 2.16s        | 3.93959           | 1.66858               | 0.0868056   |

| 2       | 4.30s        | 1.70792           | 1.02613               | 0.0868056   |

| 3       | 6.40s        | 1.62228           | 0.996023              | 0.0868056   |

| 4       | 8.52s        | 1.58266           | 0.983384              | 0.0868056   |

| 5       | 10.65s       | 1.55723           | 0.976074              | 0.0868056   |

| 6       | 12.92s       | 1.53232           | 0.968397              | 0.0868056   |

| 7       | 15.47s       | 1.51778           | 0.964143              | 0.0868056   |

| 8       | 17.94s       | 1.49833           | 0.957913              | 0.0868056   |

| 9       | 20.20s       | 1.48279           | 0.953325              | 0.0868056   |

| 10      | 22.55s       | 1.47074           | 0.949673              | 0.0868056   |

| 11      | 24.85s       | 1.458             | 0.94579               | 0.0868056   |

| 12      | 27.23s       | 1.44677           | 0.942229              | 0.0868056   |

| 13      | 29.39s       | 1.43698           | 0.93923               | 0.0868056   |

| 14      | 31.72s       | 1.42799           | 0.936534              | 0.0868056   |

| 15      | 33.82s       | 1.41811           | 0.933126              | 0.0868056   |

| 16      | 35.97s       | 1.40955           | 0.930789              | 0.0868056   |

| 17      | 38.11s       | 1.4023            | 0.928189              | 0.0868056   |

| 18      | 40.22s       | 1.39292           | 0.925387              | 0.0868056   |

| 19      | 42.35s       | 1.38626           | 0.923178              | 0.0868056   |

| 20      | 44.47s       | 1.37981           | 0.921024              | 0.0868056   |

| 21      | 46.64s       | 1.37151           | 0.918555              | 0.0868056   |

| 22      | 48.75s       | 1.3648            | 0.916354              | 0.0868056   |

| 23      | 50.88s       | 1.35774           | 0.914191              | 0.0868056   |

| 24      | 52.97s       | 1.35168           | 0.912197              | 0.0868056   |

| 25      | 55.13s       | 1.34552           | 0.909992              | 0.0868056   |

+---------+--------------+-------------------+-----------------------+-------------+

Optimization Complete: Maximum number of passes through the data reached.

Computing final objective value and training RMSE.

       Final objective value: 1.37294

       Final training RMSE: 0.901067

In [33]:

m_both

Out[33]:

Class                            : RankingFactorizationRecommender

Schema
------
User ID                          : user_id
Item ID                          : movie_id
Target                           : rating
Additional observation features  : 1
User side features               : ['user_id', 'gender', 'age', 'occupation', 'zip-code']
Item side features               : ['movie_id', 'title', 'genre']

Statistics
----------
Number of observations           : 967831
Number of users                  : 6040
Number of items                  : 3883

Training summary
----------------
Training time                    : 66.8385

Model Parameters
----------------
Model class                      : RankingFactorizationRecommender
num_factors                      : 32
binary_target                    : 0
side_data_factorization          : 1
solver                           : auto
nmf                              : 0
max_iterations                   : 25

Regularization Settings
-----------------------
regularization                   : 0.0
regularization_type              : normal
linear_regularization            : 0.0
ranking_regularization           : 0.25
unobserved_rating_value          : -1.79769313486e+308
num_sampled_negative_examples    : 4
ials_confidence_scaling_type     : auto
ials_confidence_scaling_factor   : 1

Optimization Settings
---------------------
init_random_sigma                : 0.01
sgd_convergence_interval         : 4
sgd_convergence_threshold        : 0.0
sgd_max_trial_iterations         : 5
sgd_sampling_block_size          : 131072
sgd_step_adjustment_interval     : 4
sgd_step_size                    : 0.0
sgd_trial_sample_minimum_size    : 10000
sgd_trial_sample_proportion      : 0.125
step_size_decrease_rate          : 0.75
additional_iterations_if_unhealthy : 5
adagrad_momentum_weighting       : 0.9
num_tempering_iterations         : 4
tempering_regularization_start_value : 0.0
track_exact_loss                 : 0

In [35]:

results = graphlab.recommender.util.compare_models(test, 
                                                   [m, m_user, m_item, m_both], 
                                                   user_sample=0.5)

compare_models: using 500 users to estimate model performance
PROGRESS: Evaluate model M0

Precision and recall summary statistics by cutoff
+--------+----------------+-----------------+
| cutoff | mean_precision |   mean_recall   |
+--------+----------------+-----------------+
|   1    |     0.316      | 0.0128883210584 |
|   2    |      0.29      | 0.0233054227061 |
|   3    | 0.268666666667 | 0.0321496282686 |
|   4    |     0.253      | 0.0402442767867 |
|   5    |     0.244      | 0.0490785986855 |
|   6    | 0.232666666667 |  0.055136703757 |
|   7    | 0.225142857143 | 0.0629177137344 |
|   8    |     0.2175     | 0.0678235427485 |
|   9    | 0.211777777778 | 0.0739998255263 |
|   10   |     0.2088     | 0.0813878993056 |
+--------+----------------+-----------------+
[10 rows x 3 columns]

('\nOverall RMSE: ', 0.9912235912964674)

Per User RMSE (best)
+---------+-------+----------------+
| user_id | count |      rmse      |
+---------+-------+----------------+
|   4202  |   2   | 0.243189194194 |
+---------+-------+----------------+
[1 rows x 3 columns]


Per User RMSE (worst)
+---------+-------+---------------+
| user_id | count |      rmse     |
+---------+-------+---------------+
|   1453  |   1   | 2.09470669064 |
+---------+-------+---------------+
[1 rows x 3 columns]


Per Item RMSE (best)
+----------+-------+------------------+
| movie_id | count |       rmse       |
+----------+-------+------------------+
|   2650   |   1   | 0.00523588563751 |
+----------+-------+------------------+
[1 rows x 3 columns]


Per Item RMSE (worst)
+----------+-------+---------------+
| movie_id | count |      rmse     |
+----------+-------+---------------+
|   3456   |   1   | 3.93082402092 |
+----------+-------+---------------+
[1 rows x 3 columns]

PROGRESS: Evaluate model M1

Precision and recall summary statistics by cutoff
+--------+----------------+-----------------+
| cutoff | mean_precision |   mean_recall   |
+--------+----------------+-----------------+
|   1    |     0.322      | 0.0128391995335 |
|   2    |      0.29      | 0.0209245256901 |
|   3    | 0.266666666667 | 0.0286429069371 |
|   4    |      0.26      | 0.0399834485443 |
|   5    |     0.2528     | 0.0480304103201 |
|   6    | 0.243333333333 | 0.0538587372884 |
|   7    | 0.241142857143 | 0.0627955536032 |
|   8    |    0.23175     | 0.0688002166508 |
|   9    | 0.229555555556 | 0.0775822871683 |
|   10   |     0.2228     | 0.0844365974444 |
+--------+----------------+-----------------+
[10 rows x 3 columns]

('\nOverall RMSE: ', 1.0467144960152537)

Per User RMSE (best)
+---------+-------+----------------+
| user_id | count |      rmse      |
+---------+-------+----------------+
|   5155  |   2   | 0.158692179486 |
+---------+-------+----------------+
[1 rows x 3 columns]


Per User RMSE (worst)
+---------+-------+---------------+
| user_id | count |      rmse     |
+---------+-------+---------------+
|   1453  |   1   | 2.27238116673 |
+---------+-------+---------------+
[1 rows x 3 columns]


Per Item RMSE (best)
+----------+-------+-----------------+
| movie_id | count |       rmse      |
+----------+-------+-----------------+
|   3580   |   1   | 0.0213236444021 |
+----------+-------+-----------------+
[1 rows x 3 columns]


Per Item RMSE (worst)
+----------+-------+---------------+
| movie_id | count |      rmse     |
+----------+-------+---------------+
|   1849   |   1   | 5.23411866524 |
+----------+-------+---------------+
[1 rows x 3 columns]

PROGRESS: Evaluate model M2

Precision and recall summary statistics by cutoff
+--------+----------------+-----------------+
| cutoff | mean_precision |   mean_recall   |
+--------+----------------+-----------------+
|   1    |     0.308      | 0.0135741713295 |
|   2    |     0.295      | 0.0257617271978 |
|   3    | 0.285333333333 | 0.0385055149317 |
|   4    |     0.278      | 0.0493160364699 |
|   5    |     0.266      | 0.0573723578479 |
|   6    | 0.256666666667 | 0.0633560170682 |
|   7    | 0.252571428571 | 0.0717548188532 |
|   8    |     0.2445     | 0.0786034149361 |
|   9    | 0.240666666667 | 0.0863443798709 |
|   10   |     0.2354     | 0.0925965363328 |
+--------+----------------+-----------------+
[10 rows x 3 columns]

('\nOverall RMSE: ', 1.0571870174756082)

Per User RMSE (best)
+---------+-------+----------------+
| user_id | count |      rmse      |
+---------+-------+----------------+
|   2644  |   2   | 0.319982658691 |
+---------+-------+----------------+
[1 rows x 3 columns]


Per User RMSE (worst)
+---------+-------+---------------+
| user_id | count |      rmse     |
+---------+-------+---------------+
|   1453  |   1   | 2.43030813721 |
+---------+-------+---------------+
[1 rows x 3 columns]


Per Item RMSE (best)
+----------+-------+-------------------+
| movie_id | count |        rmse       |
+----------+-------+-------------------+
|   1472   |   1   | 0.000398884009287 |
+----------+-------+-------------------+
[1 rows x 3 columns]


Per Item RMSE (worst)
+----------+-------+---------------+
| movie_id | count |      rmse     |
+----------+-------+---------------+
|   398    |   1   | 8.48029736682 |
+----------+-------+---------------+
[1 rows x 3 columns]

PROGRESS: Evaluate model M3

Precision and recall summary statistics by cutoff
+--------+----------------+-----------------+
| cutoff | mean_precision |   mean_recall   |
+--------+----------------+-----------------+
|   1    |     0.296      | 0.0108108277359 |
|   2    |     0.292      | 0.0223053284436 |
|   3    | 0.278666666667 | 0.0325446458663 |
|   4    |     0.2615     | 0.0404275263081 |
|   5    |     0.254      | 0.0482578874811 |
|   6    |     0.244      | 0.0549556020468 |
|   7    | 0.234857142857 | 0.0617971007626 |
|   8    |    0.23225     | 0.0712002266095 |
|   9    | 0.225777777778 | 0.0760029463955 |
|   10   |     0.222      | 0.0825916856647 |
+--------+----------------+-----------------+
[10 rows x 3 columns]

('\nOverall RMSE: ', 0.9864932997213498)

Per User RMSE (best)
+---------+-------+----------------+
| user_id | count |      rmse      |
+---------+-------+----------------+
|   2356  |   3   | 0.169179486062 |
+---------+-------+----------------+
[1 rows x 3 columns]


Per User RMSE (worst)
+---------+-------+---------------+
| user_id | count |      rmse     |
+---------+-------+---------------+
|   5243  |   4   | 2.11510097987 |
+---------+-------+---------------+
[1 rows x 3 columns]


Per Item RMSE (best)
+----------+-------+------------------+
| movie_id | count |       rmse       |
+----------+-------+------------------+
|   565    |   1   | 0.00157158077202 |
+----------+-------+------------------+
[1 rows x 3 columns]


Per Item RMSE (worst)
+----------+-------+---------------+
| movie_id | count |      rmse     |
+----------+-------+---------------+
|   3950   |   1   | 3.44213095993 |
+----------+-------+---------------+
[1 rows x 3 columns]

In [36]:

[results[i]['rmse_overall'] for i in range(len(results))]

Out[36]:

[0.9912235912964674,
 1.0467144960152537,
 1.0571870174756082,
 0.9864932997213498]

In [38]:

results[3]['rmse_by_item']

Out[38]:

movie_id	count	rmse
2871	18	1.08553449768
2043	3	0.28091888274
2464	1	0.600349059629
232	5	1.99505541159
3880	2	1.81932941568
2238	2	0.488967829715
3719	4	1.49264680177
431	5	0.934952171152
2661	2	0.782811477717
3811	7	0.938744314991

[2552 rows x 3 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

In [39]:

graphlab.recommender.util.compare_models(test[test['rating'] > 4], 
                                    [m_rank, m_both], 
                                    user_sample=0.2, 
                                    metric='precision_recall')

compare_models: using 186 users to estimate model performance
PROGRESS: Evaluate model M0

Precision and recall summary statistics by cutoff
+--------+-----------------+-----------------+
| cutoff |  mean_precision |   mean_recall   |
+--------+-----------------+-----------------+
|   1    |  0.102150537634 | 0.0103584423342 |
|   2    |  0.134408602151 | 0.0325592569006 |
|   3    |  0.123655913978 | 0.0469152266463 |
|   4    |  0.103494623656 | 0.0536329312206 |
|   5    | 0.0978494623656 | 0.0668332524048 |
|   6    | 0.0905017921147 | 0.0706819521975 |
|   7    | 0.0867895545315 | 0.0798389988545 |
|   8    | 0.0880376344086 | 0.0929486544744 |
|   9    |  0.089605734767 |  0.108265520447 |
|   10   | 0.0876344086022 |  0.117956009872 |
+--------+-----------------+-----------------+
[10 rows x 3 columns]

PROGRESS: Evaluate model M1

Precision and recall summary statistics by cutoff
+--------+----------------+-----------------+
| cutoff | mean_precision |   mean_recall   |
+--------+----------------+-----------------+
|   1    | 0.225806451613 | 0.0368621545266 |
|   2    | 0.185483870968 | 0.0681228986668 |
|   3    | 0.186379928315 |  0.100618452607 |
|   4    | 0.173387096774 |  0.118549158313 |
|   5    | 0.154838709677 |  0.131116798495 |
|   6    | 0.141577060932 |  0.139856304912 |
|   7    | 0.137480798771 |  0.156850192296 |
|   8    | 0.134408602151 |  0.172714099339 |
|   9    | 0.127837514934 |  0.180102516583 |
|   10   | 0.125268817204 |  0.191717395836 |
+--------+----------------+-----------------+
[10 rows x 3 columns]

Out[39]:

[{'precision_recall_by_user': Columns:
  	user_id	int
  	cutoff	int
  	precision	float
  	recall	float
  	count	int
  
  Rows: 3348
  
  Data:
  +---------+--------+----------------+----------------+-------+
  | user_id | cutoff |   precision    |     recall     | count |
  +---------+--------+----------------+----------------+-------+
  |    56   |   1    |      1.0       | 0.166666666667 |   6   |
  |    56   |   2    |      0.5       | 0.166666666667 |   6   |
  |    56   |   3    | 0.333333333333 | 0.166666666667 |   6   |
  |    56   |   4    |      0.25      | 0.166666666667 |   6   |
  |    56   |   5    |      0.4       | 0.333333333333 |   6   |
  |    56   |   6    | 0.333333333333 | 0.333333333333 |   6   |
  |    56   |   7    | 0.285714285714 | 0.333333333333 |   6   |
  |    56   |   8    |      0.25      | 0.333333333333 |   6   |
  |    56   |   9    | 0.222222222222 | 0.333333333333 |   6   |
  |    56   |   10   |      0.2       | 0.333333333333 |   6   |
  +---------+--------+----------------+----------------+-------+
  [3348 rows x 5 columns]
  Note: Only the head of the SFrame is printed.
  You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.,
  'precision_recall_overall': Columns:
  	cutoff	int
  	precision	float
  	recall	float
  
  Rows: 18
  
  Data:
  +--------+-----------------+-----------------+
  | cutoff |    precision    |      recall     |
  +--------+-----------------+-----------------+
  |   1    |  0.102150537634 | 0.0103584423342 |
  |   2    |  0.134408602151 | 0.0325592569006 |
  |   3    |  0.123655913978 | 0.0469152266463 |
  |   4    |  0.103494623656 | 0.0536329312206 |
  |   5    | 0.0978494623656 | 0.0668332524048 |
  |   6    | 0.0905017921147 | 0.0706819521975 |
  |   7    | 0.0867895545315 | 0.0798389988545 |
  |   8    | 0.0880376344086 | 0.0929486544744 |
  |   9    |  0.089605734767 |  0.108265520447 |
  |   10   | 0.0876344086022 |  0.117956009872 |
  +--------+-----------------+-----------------+
  [18 rows x 3 columns]
  Note: Only the head of the SFrame is printed.
  You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.},
 {'precision_recall_by_user': Columns:
  	user_id	int
  	cutoff	int
  	precision	float
  	recall	float
  	count	int
  
  Rows: 3348
  
  Data:
  +---------+--------+----------------+----------------+-------+
  | user_id | cutoff |   precision    |     recall     | count |
  +---------+--------+----------------+----------------+-------+
  |    56   |   1    |      0.0       |      0.0       |   6   |
  |    56   |   2    |      0.0       |      0.0       |   6   |
  |    56   |   3    | 0.333333333333 | 0.166666666667 |   6   |
  |    56   |   4    |      0.5       | 0.333333333333 |   6   |
  |    56   |   5    |      0.6       |      0.5       |   6   |
  |    56   |   6    |      0.5       |      0.5       |   6   |
  |    56   |   7    | 0.428571428571 |      0.5       |   6   |
  |    56   |   8    |     0.375      |      0.5       |   6   |
  |    56   |   9    | 0.333333333333 |      0.5       |   6   |
  |    56   |   10   |      0.3       |      0.5       |   6   |
  +---------+--------+----------------+----------------+-------+
  [3348 rows x 5 columns]
  Note: Only the head of the SFrame is printed.
  You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.,
  'precision_recall_overall': Columns:
  	cutoff	int
  	precision	float
  	recall	float
  
  Rows: 18
  
  Data:
  +--------+----------------+-----------------+
  | cutoff |   precision    |      recall     |
  +--------+----------------+-----------------+
  |   1    | 0.225806451613 | 0.0368621545266 |
  |   2    | 0.185483870968 | 0.0681228986668 |
  |   3    | 0.186379928315 |  0.100618452607 |
  |   4    | 0.173387096774 |  0.118549158313 |
  |   5    | 0.154838709677 |  0.131116798495 |
  |   6    | 0.141577060932 |  0.139856304912 |
  |   7    | 0.137480798771 |  0.156850192296 |
  |   8    | 0.134408602151 |  0.172714099339 |
  |   9    | 0.127837514934 |  0.180102516583 |
  |   10   | 0.125268817204 |  0.191717395836 |
  +--------+----------------+-----------------+
  [18 rows x 3 columns]
  Note: Only the head of the SFrame is printed.
  You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.}]

Factorization machines¶

In [ ]:

fm = graphlab.recommender.create(train.head(10000), 'user_id', 'movie_id', 'rating',
                                 method='factorization_model',
                                 item_data=item_sf, 
                                 sgd_step_size=0.09,
                                 max_iterations=10)