import graphlab
graphlab.canvas.set_target("ipynb")
rating_sf = graphlab.SFrame('ratings')
users = graphlab.SFrame('users')
items = graphlab.SFrame('items')
rating_sf
user_id | movie_id | rating | timestamp |
---|---|---|---|
1 | 1193 | 5 | 978300760 |
1 | 661 | 3 | 978302109 |
1 | 914 | 3 | 978301968 |
1 | 3408 | 4 | 978300275 |
1 | 2355 | 5 | 978824291 |
1 | 1197 | 3 | 978302268 |
1 | 1287 | 5 | 978302039 |
1 | 2804 | 5 | 978300719 |
1 | 594 | 4 | 978302268 |
1 | 919 | 4 | 978301368 |
users
user_id | gender | age | occupation | zip-code |
---|---|---|---|---|
1 | F | 1 | 10 | 48067 |
2 | M | 56 | 16 | 70072 |
3 | M | 25 | 15 | 55117 |
4 | M | 45 | 7 | 02460 |
5 | M | 25 | 20 | 55455 |
6 | F | 50 | 9 | 55117 |
7 | M | 35 | 1 | 06810 |
8 | M | 25 | 12 | 11413 |
9 | M | 25 | 17 | 61614 |
10 | F | 35 | 1 | 95370 |
items
movie_id | title | genre |
---|---|---|
1 | Toy Story (1995) | Animation|Children's|Come dy ... |
2 | Jumanji (1995) | Adventure|Children's|Fant asy ... |
3 | Grumpier Old Men (1995) | Comedy|Romance |
4 | Waiting to Exhale (1995) | Comedy|Drama |
5 | Father of the Bride Part II (1995) ... |
Comedy |
6 | Heat (1995) | Action|Crime|Thriller |
7 | Sabrina (1995) | Comedy|Romance |
8 | Tom and Huck (1995) | Adventure|Children's |
9 | Sudden Death (1995) | Action |
10 | GoldenEye (1995) | Action|Adventure|Thriller |
dir(graphlab.recommender)
['__all__', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '__path__', 'create', 'factorization_recommender', 'item_content_recommender', 'item_similarity_recommender', 'popularity_recommender', 'ranking_factorization_recommender', 'util']
item_data = graphlab.SFrame({"my_item_id" : range(4),
"data_1" : [ [1, 0], [1, 0], [0, 1], [0.5, 0.5] ],
"data_2" : [ [0, 1], [1, 0], [0, 1], [0.5, 0.5] ] })
item_data
data_1 | data_2 | my_item_id |
---|---|---|
[1.0, 0.0] | [0.0, 1.0] | 0 |
[1.0, 0.0] | [1.0, 0.0] | 1 |
[0.0, 1.0] | [0.0, 1.0] | 2 |
[0.5, 0.5] | [0.5, 0.5] | 3 |
m = graphlab.recommender.item_content_recommender.create(item_data, "my_item_id")
m.recommend_from_interactions([0])
WARNING: The ItemContentRecommender model is still in beta. WARNING: This feature transformer is still in beta, and some interpretation rules may change in the future. ('Applying transform:\n', Class : AutoVectorizer Model Fields ------------ Features : ['data_1', 'data_2'] Excluded Features : ['my_item_id'] Column Type Interpretation Transforms Output Type ------ ----- -------------- ---------- ----------- data_1 array vector None array data_2 array vector None array )
Recsys training: model = item_content_recommender
Defaulting to brute force instead of ball tree because there are multiple distance components.
Starting brute force nearest neighbors model training.
Starting pairwise querying.
+--------------+---------+-------------+--------------+
| Query points | # Pairs | % Complete. | Elapsed Time |
+--------------+---------+-------------+--------------+
| 1 | 4 | 25 | 270us |
| Done | | 100 | 345us |
+--------------+---------+-------------+--------------+
Preparing data set.
Data has 0 observations with 0 users and 4 items.
Data prepared in: 0.013882s
Loading user-provided nearest items.
Generating candidate set for working with new users.
Finished training in 0.000942s
my_item_id | score | rank |
---|---|---|
3 | 0.707106769085 | 1 |
1 | 0.5 | 2 |
2 | 0.5 | 3 |
train, test = graphlab.recommender.util.random_split_by_user(rating_sf,
'user_id', 'movie_id')
train[train['rating'] > 4]
user_id | movie_id | rating | timestamp |
---|---|---|---|
1 | 1193 | 5 | 978300760 |
1 | 2355 | 5 | 978824291 |
1 | 1287 | 5 | 978302039 |
1 | 2804 | 5 | 978300719 |
1 | 595 | 5 | 978824268 |
1 | 1035 | 5 | 978301753 |
1 | 3105 | 5 | 978301713 |
1 | 1270 | 5 | 978300055 |
1 | 527 | 5 | 978824195 |
1 | 48 | 5 | 978824351 |
from graphlab import item_similarity_recommender
itemcf = item_similarity_recommender.create(
train[train['rating'] > 4], 'user_id', 'movie_id')
Recsys training: model = item_similarity
Warning: Ignoring columns rating, timestamp;
To use one of these as a target column, set target =
and use a method that allows the use of a target.
Preparing data set.
Data has 218844 observations with 6011 users and 3228 items.
Data prepared in: 0.197671s
Training model from provided data.
Gathering per-item and per-user statistics.
+--------------------------------+------------+
| Elapsed Time (Item Statistics) | % Complete |
+--------------------------------+------------+
| 801us | 49.75 |
| 4.16ms | 100 |
+--------------------------------+------------+
Setting up lookup tables.
Processing data in one pass using dense lookup tables.
+-------------------------------------+------------------+-----------------+
| Elapsed Time (Constructing Lookups) | Total % Complete | Items Processed |
+-------------------------------------+------------------+-----------------+
| 26.554ms | 0 | 0 |
| 116.515ms | 100 | 3228 |
+-------------------------------------+------------------+-----------------+
Finalizing lookup tables.
Generating candidate set for working with new users.
Finished training in 0.153497s
pop = graphlab.popularity_recommender.create(
train[train['rating'] > 4], 'user_id', 'movie_id')
Recsys training: model = popularity
Warning: Ignoring columns rating, timestamp;
To use one of these as a target column, set target =
and use a method that allows the use of a target.
Preparing data set.
Data has 218844 observations with 6011 users and 3228 items.
Data prepared in: 0.21765s
218844 observations to process; with 3228 unique items.
m = graphlab.recommender.create(
train, 'user_id', 'movie_id', 'rating')
Recsys training: model = ranking_factorization_recommender
Preparing data set.
Data has 967831 observations with 6040 users and 3702 items.
Data prepared in: 0.95541s
Training ranking_factorization_recommender for recommendations.
+--------------------------------+--------------------------------------------------+----------+
| Parameter | Description | Value |
+--------------------------------+--------------------------------------------------+----------+
| num_factors | Factor Dimension | 32 |
| regularization | L2 Regularization on Factors | 1e-09 |
| solver | Solver used for training | adagrad |
| linear_regularization | L2 Regularization on Linear Coefficients | 1e-09 |
| ranking_regularization | Rank-based Regularization Weight | 0.25 |
| max_iterations | Maximum Number of Iterations | 25 |
+--------------------------------+--------------------------------------------------+----------+
Optimizing model using SGD; tuning step size.
Using 120978 / 967831 points for tuning the step size.
+---------+-------------------+------------------------------------------+
| Attempt | Initial Step Size | Estimated Objective Value |
+---------+-------------------+------------------------------------------+
| 0 | 16.6667 | Not Viable |
| 1 | 4.16667 | Not Viable |
| 2 | 1.04167 | Not Viable |
| 3 | 0.260417 | 1.67058 |
| 4 | 0.130208 | 1.79647 |
| 5 | 0.0651042 | 1.99236 |
| 6 | 0.0325521 | 1.92629 |
+---------+-------------------+------------------------------------------+
| Final | 0.260417 | 1.67058 |
+---------+-------------------+------------------------------------------+
Starting Optimization.
+---------+--------------+-------------------+-----------------------+-------------+
| Iter. | Elapsed Time | Approx. Objective | Approx. Training RMSE | Step Size |
+---------+--------------+-------------------+-----------------------+-------------+
| Initial | 116us | 2.44604 | 1.11694 | |
+---------+--------------+-------------------+-----------------------+-------------+
| 1 | 1.49s | DIVERGED | DIVERGED | 0.260417 |
| RESET | 1.96s | 2.44602 | 1.11693 | |
| 1 | 3.31s | DIVERGED | DIVERGED | 0.130208 |
| RESET | 3.86s | 2.44605 | 1.11694 | |
| 1 | 4.86s | 2.04087 | 1.12074 | 0.0651042 |
| 2 | 5.88s | 1.76673 | 1.02765 | 0.0651042 |
| 3 | 7.11s | 1.6966 | 1.00407 | 0.0651042 |
| 4 | 8.14s | 1.66303 | 0.99589 | 0.0651042 |
| 5 | 9.09s | 1.63916 | 0.989335 | 0.0651042 |
| 6 | 10.25s | 1.61909 | 0.983417 | 0.0651042 |
| 7 | 11.32s | 1.60004 | 0.978229 | 0.0651042 |
| 8 | 12.54s | 1.5846 | 0.973418 | 0.0651042 |
| 9 | 13.78s | 1.57167 | 0.970329 | 0.0651042 |
| 10 | 14.95s | 1.55995 | 0.966602 | 0.0651042 |
| 11 | 16.13s | 1.54959 | 0.963766 | 0.0651042 |
| 12 | 17.46s | 1.5412 | 0.961447 | 0.0651042 |
| 13 | 18.65s | 1.53128 | 0.959077 | 0.0651042 |
| 14 | 19.67s | 1.52404 | 0.957169 | 0.0651042 |
| 15 | 20.71s | 1.51732 | 0.955351 | 0.0651042 |
| 16 | 21.72s | 1.51131 | 0.953791 | 0.0651042 |
| 17 | 22.78s | 1.50507 | 0.952055 | 0.0651042 |
| 18 | 23.81s | 1.49942 | 0.950269 | 0.0651042 |
| 19 | 25.14s | 1.4943 | 0.949683 | 0.0651042 |
| 20 | 26.37s | 1.49061 | 0.948171 | 0.0651042 |
| 21 | 27.54s | 1.48516 | 0.947189 | 0.0651042 |
| 22 | 28.66s | 1.48046 | 0.945821 | 0.0651042 |
| 23 | 29.69s | 1.47685 | 0.944816 | 0.0651042 |
| 24 | 30.72s | 1.47361 | 0.944047 | 0.0651042 |
| 25 | 31.74s | 1.46908 | 0.943049 | 0.0651042 |
+---------+--------------+-------------------+-----------------------+-------------+
Optimization Complete: Maximum number of passes through the data reached.
Computing final objective value and training RMSE.
Final objective value: 1.50048
Final training RMSE: 0.937071
m
Class : RankingFactorizationRecommender Schema ------ User ID : user_id Item ID : movie_id Target : rating Additional observation features : 1 Number of user side features : 0 Number of item side features : 0 Statistics ---------- Number of observations : 965508 Number of users : 6040 Number of items : 3706 Training summary ---------------- Training time : 36.9965 Model Parameters ---------------- Model class : RankingFactorizationRecommender num_factors : 32 binary_target : 0 side_data_factorization : 1 solver : auto nmf : 0 max_iterations : 25 Regularization Settings ----------------------- regularization : 0.0 regularization_type : normal linear_regularization : 0.0 ranking_regularization : 0.25 unobserved_rating_value : -1.79769313486e+308 num_sampled_negative_examples : 4 ials_confidence_scaling_type : auto ials_confidence_scaling_factor : 1 Optimization Settings --------------------- init_random_sigma : 0.01 sgd_convergence_interval : 4 sgd_convergence_threshold : 0.0 sgd_max_trial_iterations : 5 sgd_sampling_block_size : 131072 sgd_step_adjustment_interval : 4 sgd_step_size : 0.0 sgd_trial_sample_minimum_size : 10000 sgd_trial_sample_proportion : 0.125 step_size_decrease_rate : 0.75 additional_iterations_if_unhealthy: 5 adagrad_momentum_weighting : 0.9 num_tempering_iterations : 4 tempering_regularization_start_value: 0.0 track_exact_loss : 0
m['coefficients']
{'intercept': 3.5821495005738013, 'movie_id': Columns: movie_id int linear_terms float factors array Rows: 3706 Data: +----------+------------------+-------------------------------+ | movie_id | linear_terms | factors | +----------+------------------+-------------------------------+ | 1193 | 1.06781125069 | [-0.119829073548, -0.02245... | | 661 | -0.0261590108275 | [-0.727257788181, 0.016146... | | 914 | 0.324085891247 | [-0.859803378582, 0.056376... | | 3408 | 0.565778970718 | [0.334619760513, -0.014206... | | 2355 | 0.648248255253 | [-0.248598009348, 0.103843... | | 1197 | 1.12024652958 | [-0.100379563868, 0.085359... | | 1287 | 0.345532894135 | [-0.247123196721, 0.024613... | | 2804 | 0.894821941853 | [-0.272583067417, 0.046351... | | 594 | 0.311594575644 | [-0.974369823933, 0.054282... | | 919 | 0.97704321146 | [-0.598346889019, 0.085630... | +----------+------------------+-------------------------------+ [3706 rows x 3 columns] Note: Only the head of the SFrame is printed. You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns., 'side_data': Columns: feature str index str linear_terms float factors array Rows: 1 Data: +-----------+-------+-----------------+-------------------------------+ | feature | index | linear_terms | factors | +-----------+-------+-----------------+-------------------------------+ | timestamp | 0 | -0.116745471954 | [-0.564183712006, 1.267165... | +-----------+-------+-----------------+-------------------------------+ [1 rows x 4 columns], 'user_id': Columns: user_id int linear_terms float factors array Rows: 6040 Data: +---------+------------------+-------------------------------+ | user_id | linear_terms | factors | +---------+------------------+-------------------------------+ | 1 | -0.027785371989 | [-0.0942558199167, 0.00739... | | 2 | -0.0234720371664 | [0.015922004357, -0.033992... | | 3 | -0.0345229320228 | [0.176564618945, -0.050576... | | 4 | -0.0198582224548 | [-0.0773911848664, -0.0500... | | 5 | -0.0562275871634 | [-0.0598151274025, -0.0059... | | 6 | -0.0401206016541 | [0.0565584115684, 0.030123... | | 7 | -0.0433877147734 | [0.205288589001, -0.060017... | | 8 | -0.0184100158513 | [0.169030055404, -0.043373... | | 9 | -0.0512112490833 | [0.163330376148, -0.060946... | | 10 | -0.0407416447997 | [-0.420519113541, 0.110337... | +---------+------------------+-------------------------------+ [6040 rows x 3 columns] Note: Only the head of the SFrame is printed. You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.}
graphlab.recommender.util.compare_models(test[test['rating'] > 4],
[pop, itemcf, m],
user_sample=0.5,
metric='precision_recall')
compare_models: using 466 users to estimate model performance PROGRESS: Evaluate model M0 Precision and recall summary statistics by cutoff +--------+-----------------+-----------------+ | cutoff | mean_precision | mean_recall | +--------+-----------------+-----------------+ | 1 | 0.111587982833 | 0.0210169663883 | | 2 | 0.105150214592 | 0.0425562055746 | | 3 | 0.102288984263 | 0.0544524525898 | | 4 | 0.0992489270386 | 0.0720982969229 | | 5 | 0.0909871244635 | 0.0813264276131 | | 6 | 0.0894134477825 | 0.0945639457791 | | 7 | 0.0843041079093 | 0.100918470675 | | 8 | 0.0815450643777 | 0.109968226951 | | 9 | 0.0810681926562 | 0.126194673165 | | 10 | 0.0791845493562 | 0.136877932933 | +--------+-----------------+-----------------+ [10 rows x 3 columns] PROGRESS: Evaluate model M1 Precision and recall summary statistics by cutoff +--------+----------------+-----------------+ | cutoff | mean_precision | mean_recall | +--------+----------------+-----------------+ | 1 | 0.214592274678 | 0.0357070732147 | | 2 | 0.188841201717 | 0.0624344732159 | | 3 | 0.171673819742 | 0.0839114919565 | | 4 | 0.157725321888 | 0.102658606754 | | 5 | 0.148497854077 | 0.119670145788 | | 6 | 0.141273247496 | 0.134922725517 | | 7 | 0.1330472103 | 0.14491465424 | | 8 | 0.12660944206 | 0.161011620372 | | 9 | 0.120886981402 | 0.175966528519 | | 10 | 0.113733905579 | 0.182583941999 | +--------+----------------+-----------------+ [10 rows x 3 columns] PROGRESS: Evaluate model M2 Precision and recall summary statistics by cutoff +--------+----------------+-----------------+ | cutoff | mean_precision | mean_recall | +--------+----------------+-----------------+ | 1 | 0.255364806867 | 0.0440953445436 | | 2 | 0.224248927039 | 0.0737550631862 | | 3 | 0.206008583691 | 0.0966051323736 | | 4 | 0.188841201717 | 0.114173476478 | | 5 | 0.178540772532 | 0.135443733289 | | 6 | 0.168097281831 | 0.15412416307 | | 7 | 0.161863887186 | 0.172695193849 | | 8 | 0.152628755365 | 0.183265481294 | | 9 | 0.145207439199 | 0.195052041281 | | 10 | 0.139270386266 | 0.207348413937 | +--------+----------------+-----------------+ [10 rows x 3 columns]
[{'precision_recall_by_user': Columns: user_id int cutoff int precision float recall float count int Rows: 8388 Data: +---------+--------+----------------+----------------+-------+ | user_id | cutoff | precision | recall | count | +---------+--------+----------------+----------------+-------+ | 11 | 1 | 0.0 | 0.0 | 6 | | 11 | 2 | 0.0 | 0.0 | 6 | | 11 | 3 | 0.0 | 0.0 | 6 | | 11 | 4 | 0.0 | 0.0 | 6 | | 11 | 5 | 0.0 | 0.0 | 6 | | 11 | 6 | 0.0 | 0.0 | 6 | | 11 | 7 | 0.0 | 0.0 | 6 | | 11 | 8 | 0.125 | 0.166666666667 | 6 | | 11 | 9 | 0.111111111111 | 0.166666666667 | 6 | | 11 | 10 | 0.1 | 0.166666666667 | 6 | +---------+--------+----------------+----------------+-------+ [8388 rows x 5 columns] Note: Only the head of the SFrame is printed. You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns., 'precision_recall_overall': Columns: cutoff int precision float recall float Rows: 18 Data: +--------+-----------------+-----------------+ | cutoff | precision | recall | +--------+-----------------+-----------------+ | 1 | 0.111587982833 | 0.0210169663883 | | 2 | 0.105150214592 | 0.0425562055746 | | 3 | 0.102288984263 | 0.0544524525898 | | 4 | 0.0992489270386 | 0.0720982969229 | | 5 | 0.0909871244635 | 0.0813264276131 | | 6 | 0.0894134477825 | 0.0945639457791 | | 7 | 0.0843041079093 | 0.100918470675 | | 8 | 0.0815450643777 | 0.109968226951 | | 9 | 0.0810681926562 | 0.126194673165 | | 10 | 0.0791845493562 | 0.136877932933 | +--------+-----------------+-----------------+ [18 rows x 3 columns] Note: Only the head of the SFrame is printed. You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.}, {'precision_recall_by_user': Columns: user_id int cutoff int precision float recall float count int Rows: 8388 Data: +---------+--------+----------------+----------------+-------+ | user_id | cutoff | precision | recall | count | +---------+--------+----------------+----------------+-------+ | 11 | 1 | 0.0 | 0.0 | 6 | | 11 | 2 | 0.5 | 0.166666666667 | 6 | | 11 | 3 | 0.333333333333 | 0.166666666667 | 6 | | 11 | 4 | 0.5 | 0.333333333333 | 6 | | 11 | 5 | 0.4 | 0.333333333333 | 6 | | 11 | 6 | 0.333333333333 | 0.333333333333 | 6 | | 11 | 7 | 0.285714285714 | 0.333333333333 | 6 | | 11 | 8 | 0.25 | 0.333333333333 | 6 | | 11 | 9 | 0.222222222222 | 0.333333333333 | 6 | | 11 | 10 | 0.2 | 0.333333333333 | 6 | +---------+--------+----------------+----------------+-------+ [8388 rows x 5 columns] Note: Only the head of the SFrame is printed. You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns., 'precision_recall_overall': Columns: cutoff int precision float recall float Rows: 18 Data: +--------+----------------+-----------------+ | cutoff | precision | recall | +--------+----------------+-----------------+ | 1 | 0.214592274678 | 0.0357070732147 | | 2 | 0.188841201717 | 0.0624344732159 | | 3 | 0.171673819742 | 0.0839114919565 | | 4 | 0.157725321888 | 0.102658606754 | | 5 | 0.148497854077 | 0.119670145788 | | 6 | 0.141273247496 | 0.134922725517 | | 7 | 0.1330472103 | 0.14491465424 | | 8 | 0.12660944206 | 0.161011620372 | | 9 | 0.120886981402 | 0.175966528519 | | 10 | 0.113733905579 | 0.182583941999 | +--------+----------------+-----------------+ [18 rows x 3 columns] Note: Only the head of the SFrame is printed. You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.}, {'precision_recall_by_user': Columns: user_id int cutoff int precision float recall float count int Rows: 8388 Data: +---------+--------+----------------+----------------+-------+ | user_id | cutoff | precision | recall | count | +---------+--------+----------------+----------------+-------+ | 11 | 1 | 0.0 | 0.0 | 6 | | 11 | 2 | 0.5 | 0.166666666667 | 6 | | 11 | 3 | 0.333333333333 | 0.166666666667 | 6 | | 11 | 4 | 0.25 | 0.166666666667 | 6 | | 11 | 5 | 0.2 | 0.166666666667 | 6 | | 11 | 6 | 0.333333333333 | 0.333333333333 | 6 | | 11 | 7 | 0.285714285714 | 0.333333333333 | 6 | | 11 | 8 | 0.25 | 0.333333333333 | 6 | | 11 | 9 | 0.222222222222 | 0.333333333333 | 6 | | 11 | 10 | 0.2 | 0.333333333333 | 6 | +---------+--------+----------------+----------------+-------+ [8388 rows x 5 columns] Note: Only the head of the SFrame is printed. You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns., 'precision_recall_overall': Columns: cutoff int precision float recall float Rows: 18 Data: +--------+----------------+-----------------+ | cutoff | precision | recall | +--------+----------------+-----------------+ | 1 | 0.255364806867 | 0.0440953445436 | | 2 | 0.224248927039 | 0.0737550631862 | | 3 | 0.206008583691 | 0.0966051323736 | | 4 | 0.188841201717 | 0.114173476478 | | 5 | 0.178540772532 | 0.135443733289 | | 6 | 0.168097281831 | 0.15412416307 | | 7 | 0.161863887186 | 0.172695193849 | | 8 | 0.152628755365 | 0.183265481294 | | 9 | 0.145207439199 | 0.195052041281 | | 10 | 0.139270386266 | 0.207348413937 | +--------+----------------+-----------------+ [18 rows x 3 columns] Note: Only the head of the SFrame is printed. You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.}]
m_rank = graphlab.recommender.ranking_factorization_recommender.create(
train, 'user_id', 'movie_id', 'rating',
unobserved_rating_value=3)
Recsys training: model = ranking_factorization_recommender
Preparing data set.
Data has 967831 observations with 6040 users and 3702 items.
Data prepared in: 0.983562s
Training ranking_factorization_recommender for recommendations.
+--------------------------------+--------------------------------------------------+----------+
| Parameter | Description | Value |
+--------------------------------+--------------------------------------------------+----------+
| num_factors | Factor Dimension | 32 |
| regularization | L2 Regularization on Factors | 1e-09 |
| solver | Solver used for training | adagrad |
| linear_regularization | L2 Regularization on Linear Coefficients | 1e-09 |
| ranking_regularization | Rank-based Regularization Weight | 0.25 |
| unobserved_rating_value | Ranking Target Rating for Unobserved Interacti...| 3 |
| max_iterations | Maximum Number of Iterations | 25 |
+--------------------------------+--------------------------------------------------+----------+
Optimizing model using SGD; tuning step size.
Using 120978 / 967831 points for tuning the step size.
+---------+-------------------+------------------------------------------+
| Attempt | Initial Step Size | Estimated Objective Value |
+---------+-------------------+------------------------------------------+
| 0 | 16.6667 | Not Viable |
| 1 | 4.16667 | Not Viable |
| 2 | 1.04167 | Not Viable |
| 3 | 0.260417 | 0.521984 |
| 4 | 0.130208 | 0.573774 |
| 5 | 0.0651042 | 0.990046 |
| 6 | 0.0325521 | 0.945016 |
+---------+-------------------+------------------------------------------+
| Final | 0.260417 | 0.521984 |
+---------+-------------------+------------------------------------------+
Starting Optimization.
+---------+--------------+-------------------+-----------------------+-------------+
| Iter. | Elapsed Time | Approx. Objective | Approx. Training RMSE | Step Size |
+---------+--------------+-------------------+-----------------------+-------------+
| Initial | 110us | 1.33206 | 1.11694 | |
+---------+--------------+-------------------+-----------------------+-------------+
| 1 | 1.02s | 1.40375 | 1.09634 | 0.260417 |
| 2 | 1.93s | 0.936617 | 0.900854 | 0.260417 |
| 3 | 2.91s | 0.824838 | 0.846264 | 0.260417 |
| 4 | 3.87s | 0.810082 | 0.837036 | 0.260417 |
| 5 | 4.77s | 0.77387 | 0.817288 | 0.260417 |
| 6 | 5.58s | 0.75637 | 0.807332 | 0.260417 |
| 10 | 9.20s | 0.697997 | 0.774232 | 0.260417 |
| 11 | 10.03s | 0.680322 | 0.764175 | 0.260417 |
| 15 | 13.62s | 0.659587 | 0.751889 | 0.260417 |
| 20 | 18.44s | 0.640783 | 0.740592 | 0.260417 |
| 25 | 23.24s | 0.624992 | 0.731253 | 0.260417 |
+---------+--------------+-------------------+-----------------------+-------------+
Optimization Complete: Maximum number of passes through the data reached.
Computing final objective value and training RMSE.
Final objective value: 0.628853
Final training RMSE: 0.723727
results = graphlab.recommender.util.compare_models(
test[test['rating'] > 4],
[pop, itemcf, m, m_rank],
user_sample=0.5,
metric='precision_recall')
compare_models: using 466 users to estimate model performance PROGRESS: Evaluate model M0 Precision and recall summary statistics by cutoff +--------+-----------------+-----------------+ | cutoff | mean_precision | mean_recall | +--------+-----------------+-----------------+ | 1 | 0.087982832618 | 0.0140248361318 | | 2 | 0.0965665236052 | 0.0292522655743 | | 3 | 0.0937052932761 | 0.0365127169855 | | 4 | 0.0895922746781 | 0.046837188255 | | 5 | 0.0862660944206 | 0.0596693591506 | | 6 | 0.0808297567954 | 0.0685112959099 | | 7 | 0.0787860208461 | 0.0764441666663 | | 8 | 0.0769849785408 | 0.0846504186142 | | 9 | 0.0762994754411 | 0.0972750087562 | | 10 | 0.0744635193133 | 0.107658443171 | +--------+-----------------+-----------------+ [10 rows x 3 columns] PROGRESS: Evaluate model M1 Precision and recall summary statistics by cutoff +--------+----------------+-----------------+ | cutoff | mean_precision | mean_recall | +--------+----------------+-----------------+ | 1 | 0.206008583691 | 0.0348964593846 | | 2 | 0.192060085837 | 0.0615229961222 | | 3 | 0.176680972818 | 0.0815227915717 | | 4 | 0.157188841202 | 0.0943852374705 | | 5 | 0.148497854077 | 0.117658845156 | | 6 | 0.138054363376 | 0.127392485241 | | 7 | 0.133966891478 | 0.139672970091 | | 8 | 0.12821888412 | 0.157576812504 | | 9 | 0.12327134001 | 0.173337160645 | | 10 | 0.115879828326 | 0.179741133016 | +--------+----------------+-----------------+ [10 rows x 3 columns] PROGRESS: Evaluate model M2 Precision and recall summary statistics by cutoff +--------+----------------+-----------------+ | cutoff | mean_precision | mean_recall | +--------+----------------+-----------------+ | 1 | 0.225321888412 | 0.0341165787993 | | 2 | 0.203862660944 | 0.0592605021931 | | 3 | 0.191702432046 | 0.0799255771478 | | 4 | 0.17864806867 | 0.0976891531755 | | 5 | 0.170386266094 | 0.117871623275 | | 6 | 0.157010014306 | 0.134454543693 | | 7 | 0.152667075414 | 0.149987723075 | | 8 | 0.145922746781 | 0.160054923279 | | 9 | 0.137577491655 | 0.16716187937 | | 10 | 0.134120171674 | 0.183519021385 | +--------+----------------+-----------------+ [10 rows x 3 columns] PROGRESS: Evaluate model M3 Precision and recall summary statistics by cutoff +--------+----------------+-----------------+ | cutoff | mean_precision | mean_recall | +--------+----------------+-----------------+ | 1 | 0.145922746781 | 0.0155536026315 | | 2 | 0.151287553648 | 0.032545498401 | | 3 | 0.138769670959 | 0.0480767788701 | | 4 | 0.128755364807 | 0.0602766563169 | | 5 | 0.124034334764 | 0.0725574969378 | | 6 | 0.122675250358 | 0.0837751352493 | | 7 | 0.115879828326 | 0.0916484679064 | | 8 | 0.113733905579 | 0.104413957145 | | 9 | 0.111111111111 | 0.116317958073 | | 10 | 0.107296137339 | 0.125368902557 | +--------+----------------+-----------------+ [10 rows x 3 columns]
results[3]['precision_recall_overall']
cutoff | precision | recall |
---|---|---|
1 | 0.145922746781 | 0.0155536026315 |
2 | 0.151287553648 | 0.032545498401 |
3 | 0.138769670959 | 0.0480767788701 |
4 | 0.128755364807 | 0.0602766563169 |
5 | 0.124034334764 | 0.0725574969378 |
6 | 0.122675250358 | 0.0837751352493 |
7 | 0.115879828326 | 0.0916484679064 |
8 | 0.113733905579 | 0.104413957145 |
9 | 0.111111111111 | 0.116317958073 |
10 | 0.107296137339 | 0.125368902557 |
user_sf = graphlab.SFrame('users')
item_sf = graphlab.SFrame('items')
user_sf
user_id | gender | age | occupation | zip-code |
---|---|---|---|---|
1 | F | 1 | 10 | 48067 |
2 | M | 56 | 16 | 70072 |
3 | M | 25 | 15 | 55117 |
4 | M | 45 | 7 | 02460 |
5 | M | 25 | 20 | 55455 |
6 | F | 50 | 9 | 55117 |
7 | M | 35 | 1 | 06810 |
8 | M | 25 | 12 | 11413 |
9 | M | 25 | 17 | 61614 |
10 | F | 35 | 1 | 95370 |
item_sf
movie_id | title | genre |
---|---|---|
1 | Toy Story (1995) | Animation|Children's|Come dy ... |
2 | Jumanji (1995) | Adventure|Children's|Fant asy ... |
3 | Grumpier Old Men (1995) | Comedy|Romance |
4 | Waiting to Exhale (1995) | Comedy|Drama |
5 | Father of the Bride Part II (1995) ... |
Comedy |
6 | Heat (1995) | Action|Crime|Thriller |
7 | Sabrina (1995) | Comedy|Romance |
8 | Tom and Huck (1995) | Adventure|Children's |
9 | Sudden Death (1995) | Action |
10 | GoldenEye (1995) | Action|Adventure|Thriller |
m_user = graphlab.recommender.create(train, 'user_id', 'movie_id', 'rating',
user_data=user_sf)
m_item = graphlab.recommender.create(train, 'user_id', 'movie_id', 'rating',
item_data=item_sf)
m_both = graphlab.recommender.create(train, 'user_id', 'movie_id', 'rating',
user_data=user_sf, item_data=item_sf)
Recsys training: model = ranking_factorization_recommender
Preparing data set.
Data has 967831 observations with 6040 users and 3702 items.
Data prepared in: 0.937442s
Training ranking_factorization_recommender for recommendations.
+--------------------------------+--------------------------------------------------+----------+
| Parameter | Description | Value |
+--------------------------------+--------------------------------------------------+----------+
| num_factors | Factor Dimension | 32 |
| regularization | L2 Regularization on Factors | 1e-09 |
| solver | Solver used for training | adagrad |
| linear_regularization | L2 Regularization on Linear Coefficients | 1e-09 |
| ranking_regularization | Rank-based Regularization Weight | 0.25 |
| side_data_factorization | Assign Factors for Side Data | True |
| max_iterations | Maximum Number of Iterations | 25 |
+--------------------------------+--------------------------------------------------+----------+
Optimizing model using SGD; tuning step size.
Using 120978 / 967831 points for tuning the step size.
+---------+-------------------+------------------------------------------+
| Attempt | Initial Step Size | Estimated Objective Value |
+---------+-------------------+------------------------------------------+
| 0 | 7.14286 | Not Viable |
| 1 | 1.78571 | Not Viable |
| 2 | 0.446429 | 1.3375 |
| 3 | 0.223214 | 1.094 |
| 4 | 0.111607 | 1.12747 |
| 5 | 0.0558036 | 1.25133 |
| 6 | 0.0279018 | 1.80473 |
+---------+-------------------+------------------------------------------+
| Final | 0.223214 | 1.094 |
+---------+-------------------+------------------------------------------+
Starting Optimization.
+---------+--------------+-------------------+-----------------------+-------------+
| Iter. | Elapsed Time | Approx. Objective | Approx. Training RMSE | Step Size |
+---------+--------------+-------------------+-----------------------+-------------+
| Initial | 69us | 2.44586 | 1.11695 | |
+---------+--------------+-------------------+-----------------------+-------------+
| 1 | 1.85s | 3.2566 | 1.52186 | 0.223214 |
| 2 | 4.01s | 1.82009 | 1.07645 | 0.223214 |
| 3 | 5.99s | 1.52197 | 0.962525 | 0.223214 |
| 4 | 7.99s | 1.46419 | 0.943667 | 0.223214 |
| 5 | 10.05s | 1.42411 | 0.929737 | 0.223214 |
| 6 | 11.86s | 1.4125 | 0.927836 | 0.223214 |
| 7 | 13.77s | 1.45768 | 0.950511 | 0.223214 |
| 8 | 15.73s | 1.41854 | 0.933561 | 0.223214 |
| 9 | 17.54s | 1.3976 | 0.925889 | 0.223214 |
| 10 | 19.33s | 1.39159 | 0.925112 | 0.223214 |
| 11 | 21.10s | 1.37437 | 0.918518 | 0.223214 |
| 12 | 22.88s | 1.36352 | 0.9152 | 0.223214 |
| 13 | 24.67s | 1.35502 | 0.912162 | 0.223214 |
| 14 | 26.57s | 1.35012 | 0.910909 | 0.223214 |
| 15 | 28.45s | 1.3471 | 0.910619 | 0.223214 |
| 16 | 30.48s | 1.33695 | 0.906491 | 0.223214 |
| 17 | 32.37s | 1.32908 | 0.903518 | 0.223214 |
| 18 | 34.39s | 1.3272 | 0.903303 | 0.223214 |
| 19 | 36.87s | 1.32117 | 0.901367 | 0.223214 |
| 20 | 39.11s | 1.32256 | 0.902126 | 0.223214 |
| 21 | 41.12s | 1.31449 | 0.898852 | 0.223214 |
| 22 | 43.09s | 1.30774 | 0.896466 | 0.223214 |
| 23 | 45.10s | 1.30063 | 0.894055 | 0.223214 |
| 24 | 47.15s | 1.29969 | 0.89392 | 0.223214 |
| 25 | 49.25s | 1.2995 | 0.894295 | 0.223214 |
+---------+--------------+-------------------+-----------------------+-------------+
Optimization Complete: Maximum number of passes through the data reached.
Computing final objective value and training RMSE.
Final objective value: 1.3239
Final training RMSE: 0.879756
Recsys training: model = ranking_factorization_recommender
Preparing data set.
Data has 967831 observations with 6040 users and 3883 items.
Data prepared in: 1.00141s
Training ranking_factorization_recommender for recommendations.
+--------------------------------+--------------------------------------------------+----------+
| Parameter | Description | Value |
+--------------------------------+--------------------------------------------------+----------+
| num_factors | Factor Dimension | 32 |
| regularization | L2 Regularization on Factors | 1e-09 |
| solver | Solver used for training | adagrad |
| linear_regularization | L2 Regularization on Linear Coefficients | 1e-09 |
| ranking_regularization | Rank-based Regularization Weight | 0.25 |
| side_data_factorization | Assign Factors for Side Data | True |
| max_iterations | Maximum Number of Iterations | 25 |
+--------------------------------+--------------------------------------------------+----------+
Optimizing model using SGD; tuning step size.
Using 120978 / 967831 points for tuning the step size.
+---------+-------------------+------------------------------------------+
| Attempt | Initial Step Size | Estimated Objective Value |
+---------+-------------------+------------------------------------------+
| 0 | 10 | Not Viable |
| 1 | 2.5 | Not Viable |
| 2 | 0.625 | Not Viable |
| 3 | 0.15625 | 1.06978 |
| 4 | 0.078125 | 1.54845 |
| 5 | 0.0390625 | 1.81084 |
| 6 | 0.0195312 | 1.78482 |
+---------+-------------------+------------------------------------------+
| Final | 0.15625 | 1.06978 |
+---------+-------------------+------------------------------------------+
Starting Optimization.
+---------+--------------+-------------------+-----------------------+-------------+
| Iter. | Elapsed Time | Approx. Objective | Approx. Training RMSE | Step Size |
+---------+--------------+-------------------+-----------------------+-------------+
| Initial | 97us | 2.44627 | 1.11695 | |
+---------+--------------+-------------------+-----------------------+-------------+
| 1 | 1.46s | 1.85534 | 1.1103 | 0.15625 |
| 2 | 2.94s | 1.45672 | 0.949586 | 0.15625 |
| 3 | 4.38s | 1.35111 | 0.91098 | 0.15625 |
| 4 | 5.79s | 1.32376 | 0.904016 | 0.15625 |
| 5 | 7.23s | 1.299 | 0.895309 | 0.15625 |
| 6 | 8.78s | 1.27552 | 0.887471 | 0.15625 |
| 7 | 10.21s | 1.24705 | 0.875703 | 0.15625 |
| 8 | 11.58s | 1.23298 | 0.870401 | 0.15625 |
| 9 | 12.96s | 1.2188 | 0.865168 | 0.15625 |
| 10 | 14.36s | 1.20627 | 0.860272 | 0.15625 |
| 11 | 15.76s | 1.19564 | 0.856134 | 0.15625 |
| 12 | 17.13s | 1.19297 | 0.855644 | 0.15625 |
| 13 | 18.50s | 1.18891 | 0.854091 | 0.15625 |
| 14 | 19.88s | 1.18114 | 0.850945 | 0.15625 |
| 15 | 21.42s | 1.17493 | 0.848849 | 0.15625 |
| 16 | 22.85s | 1.16891 | 0.846376 | 0.15625 |
| 17 | 24.21s | 1.16579 | 0.844708 | 0.15625 |
| 18 | 25.58s | 1.16022 | 0.842751 | 0.15625 |
| 19 | 26.95s | 1.15579 | 0.840918 | 0.15625 |
| 20 | 28.33s | 1.15072 | 0.838805 | 0.15625 |
| 21 | 29.72s | 1.14833 | 0.83774 | 0.15625 |
| 22 | 31.04s | 1.1448 | 0.83681 | 0.15625 |
| 23 | 32.40s | 1.14035 | 0.834712 | 0.15625 |
| 24 | 33.83s | 1.13886 | 0.834142 | 0.15625 |
| 25 | 35.16s | 1.13705 | 0.833517 | 0.15625 |
+---------+--------------+-------------------+-----------------------+-------------+
Optimization Complete: Maximum number of passes through the data reached.
Computing final objective value and training RMSE.
Final objective value: 1.15511
Final training RMSE: 0.815472
Recsys training: model = ranking_factorization_recommender
Preparing data set.
Data has 967831 observations with 6040 users and 3883 items.
Data prepared in: 0.956958s
Training ranking_factorization_recommender for recommendations.
+--------------------------------+--------------------------------------------------+----------+
| Parameter | Description | Value |
+--------------------------------+--------------------------------------------------+----------+
| num_factors | Factor Dimension | 32 |
| regularization | L2 Regularization on Factors | 1e-09 |
| solver | Solver used for training | adagrad |
| linear_regularization | L2 Regularization on Linear Coefficients | 1e-09 |
| ranking_regularization | Rank-based Regularization Weight | 0.25 |
| side_data_factorization | Assign Factors for Side Data | True |
| max_iterations | Maximum Number of Iterations | 25 |
+--------------------------------+--------------------------------------------------+----------+
Optimizing model using SGD; tuning step size.
Using 120978 / 967831 points for tuning the step size.
+---------+-------------------+------------------------------------------+
| Attempt | Initial Step Size | Estimated Objective Value |
+---------+-------------------+------------------------------------------+
| 0 | 5.55556 | Not Viable |
| 1 | 1.38889 | Not Viable |
| 2 | 0.347222 | 1.31801 |
| 3 | 0.173611 | 1.05607 |
| 4 | 0.0868056 | 1.03567 |
| 5 | 0.0434028 | 1.19629 |
| 6 | 0.0217014 | 1.4938 |
| 7 | 0.0108507 | 1.70777 |
+---------+-------------------+------------------------------------------+
| Final | 0.0868056 | 1.03567 |
+---------+-------------------+------------------------------------------+
Starting Optimization.
+---------+--------------+-------------------+-----------------------+-------------+
| Iter. | Elapsed Time | Approx. Objective | Approx. Training RMSE | Step Size |
+---------+--------------+-------------------+-----------------------+-------------+
| Initial | 69us | 2.44654 | 1.11696 | |
+---------+--------------+-------------------+-----------------------+-------------+
| 1 | 2.16s | 3.93959 | 1.66858 | 0.0868056 |
| 2 | 4.30s | 1.70792 | 1.02613 | 0.0868056 |
| 3 | 6.40s | 1.62228 | 0.996023 | 0.0868056 |
| 4 | 8.52s | 1.58266 | 0.983384 | 0.0868056 |
| 5 | 10.65s | 1.55723 | 0.976074 | 0.0868056 |
| 6 | 12.92s | 1.53232 | 0.968397 | 0.0868056 |
| 7 | 15.47s | 1.51778 | 0.964143 | 0.0868056 |
| 8 | 17.94s | 1.49833 | 0.957913 | 0.0868056 |
| 9 | 20.20s | 1.48279 | 0.953325 | 0.0868056 |
| 10 | 22.55s | 1.47074 | 0.949673 | 0.0868056 |
| 11 | 24.85s | 1.458 | 0.94579 | 0.0868056 |
| 12 | 27.23s | 1.44677 | 0.942229 | 0.0868056 |
| 13 | 29.39s | 1.43698 | 0.93923 | 0.0868056 |
| 14 | 31.72s | 1.42799 | 0.936534 | 0.0868056 |
| 15 | 33.82s | 1.41811 | 0.933126 | 0.0868056 |
| 16 | 35.97s | 1.40955 | 0.930789 | 0.0868056 |
| 17 | 38.11s | 1.4023 | 0.928189 | 0.0868056 |
| 18 | 40.22s | 1.39292 | 0.925387 | 0.0868056 |
| 19 | 42.35s | 1.38626 | 0.923178 | 0.0868056 |
| 20 | 44.47s | 1.37981 | 0.921024 | 0.0868056 |
| 21 | 46.64s | 1.37151 | 0.918555 | 0.0868056 |
| 22 | 48.75s | 1.3648 | 0.916354 | 0.0868056 |
| 23 | 50.88s | 1.35774 | 0.914191 | 0.0868056 |
| 24 | 52.97s | 1.35168 | 0.912197 | 0.0868056 |
| 25 | 55.13s | 1.34552 | 0.909992 | 0.0868056 |
+---------+--------------+-------------------+-----------------------+-------------+
Optimization Complete: Maximum number of passes through the data reached.
Computing final objective value and training RMSE.
Final objective value: 1.37294
Final training RMSE: 0.901067
m_both
Class : RankingFactorizationRecommender Schema ------ User ID : user_id Item ID : movie_id Target : rating Additional observation features : 1 User side features : ['user_id', 'gender', 'age', 'occupation', 'zip-code'] Item side features : ['movie_id', 'title', 'genre'] Statistics ---------- Number of observations : 967831 Number of users : 6040 Number of items : 3883 Training summary ---------------- Training time : 66.8385 Model Parameters ---------------- Model class : RankingFactorizationRecommender num_factors : 32 binary_target : 0 side_data_factorization : 1 solver : auto nmf : 0 max_iterations : 25 Regularization Settings ----------------------- regularization : 0.0 regularization_type : normal linear_regularization : 0.0 ranking_regularization : 0.25 unobserved_rating_value : -1.79769313486e+308 num_sampled_negative_examples : 4 ials_confidence_scaling_type : auto ials_confidence_scaling_factor : 1 Optimization Settings --------------------- init_random_sigma : 0.01 sgd_convergence_interval : 4 sgd_convergence_threshold : 0.0 sgd_max_trial_iterations : 5 sgd_sampling_block_size : 131072 sgd_step_adjustment_interval : 4 sgd_step_size : 0.0 sgd_trial_sample_minimum_size : 10000 sgd_trial_sample_proportion : 0.125 step_size_decrease_rate : 0.75 additional_iterations_if_unhealthy : 5 adagrad_momentum_weighting : 0.9 num_tempering_iterations : 4 tempering_regularization_start_value : 0.0 track_exact_loss : 0
results = graphlab.recommender.util.compare_models(test,
[m, m_user, m_item, m_both],
user_sample=0.5)
compare_models: using 500 users to estimate model performance PROGRESS: Evaluate model M0 Precision and recall summary statistics by cutoff +--------+----------------+-----------------+ | cutoff | mean_precision | mean_recall | +--------+----------------+-----------------+ | 1 | 0.316 | 0.0128883210584 | | 2 | 0.29 | 0.0233054227061 | | 3 | 0.268666666667 | 0.0321496282686 | | 4 | 0.253 | 0.0402442767867 | | 5 | 0.244 | 0.0490785986855 | | 6 | 0.232666666667 | 0.055136703757 | | 7 | 0.225142857143 | 0.0629177137344 | | 8 | 0.2175 | 0.0678235427485 | | 9 | 0.211777777778 | 0.0739998255263 | | 10 | 0.2088 | 0.0813878993056 | +--------+----------------+-----------------+ [10 rows x 3 columns] ('\nOverall RMSE: ', 0.9912235912964674) Per User RMSE (best) +---------+-------+----------------+ | user_id | count | rmse | +---------+-------+----------------+ | 4202 | 2 | 0.243189194194 | +---------+-------+----------------+ [1 rows x 3 columns] Per User RMSE (worst) +---------+-------+---------------+ | user_id | count | rmse | +---------+-------+---------------+ | 1453 | 1 | 2.09470669064 | +---------+-------+---------------+ [1 rows x 3 columns] Per Item RMSE (best) +----------+-------+------------------+ | movie_id | count | rmse | +----------+-------+------------------+ | 2650 | 1 | 0.00523588563751 | +----------+-------+------------------+ [1 rows x 3 columns] Per Item RMSE (worst) +----------+-------+---------------+ | movie_id | count | rmse | +----------+-------+---------------+ | 3456 | 1 | 3.93082402092 | +----------+-------+---------------+ [1 rows x 3 columns] PROGRESS: Evaluate model M1 Precision and recall summary statistics by cutoff +--------+----------------+-----------------+ | cutoff | mean_precision | mean_recall | +--------+----------------+-----------------+ | 1 | 0.322 | 0.0128391995335 | | 2 | 0.29 | 0.0209245256901 | | 3 | 0.266666666667 | 0.0286429069371 | | 4 | 0.26 | 0.0399834485443 | | 5 | 0.2528 | 0.0480304103201 | | 6 | 0.243333333333 | 0.0538587372884 | | 7 | 0.241142857143 | 0.0627955536032 | | 8 | 0.23175 | 0.0688002166508 | | 9 | 0.229555555556 | 0.0775822871683 | | 10 | 0.2228 | 0.0844365974444 | +--------+----------------+-----------------+ [10 rows x 3 columns] ('\nOverall RMSE: ', 1.0467144960152537) Per User RMSE (best) +---------+-------+----------------+ | user_id | count | rmse | +---------+-------+----------------+ | 5155 | 2 | 0.158692179486 | +---------+-------+----------------+ [1 rows x 3 columns] Per User RMSE (worst) +---------+-------+---------------+ | user_id | count | rmse | +---------+-------+---------------+ | 1453 | 1 | 2.27238116673 | +---------+-------+---------------+ [1 rows x 3 columns] Per Item RMSE (best) +----------+-------+-----------------+ | movie_id | count | rmse | +----------+-------+-----------------+ | 3580 | 1 | 0.0213236444021 | +----------+-------+-----------------+ [1 rows x 3 columns] Per Item RMSE (worst) +----------+-------+---------------+ | movie_id | count | rmse | +----------+-------+---------------+ | 1849 | 1 | 5.23411866524 | +----------+-------+---------------+ [1 rows x 3 columns] PROGRESS: Evaluate model M2 Precision and recall summary statistics by cutoff +--------+----------------+-----------------+ | cutoff | mean_precision | mean_recall | +--------+----------------+-----------------+ | 1 | 0.308 | 0.0135741713295 | | 2 | 0.295 | 0.0257617271978 | | 3 | 0.285333333333 | 0.0385055149317 | | 4 | 0.278 | 0.0493160364699 | | 5 | 0.266 | 0.0573723578479 | | 6 | 0.256666666667 | 0.0633560170682 | | 7 | 0.252571428571 | 0.0717548188532 | | 8 | 0.2445 | 0.0786034149361 | | 9 | 0.240666666667 | 0.0863443798709 | | 10 | 0.2354 | 0.0925965363328 | +--------+----------------+-----------------+ [10 rows x 3 columns] ('\nOverall RMSE: ', 1.0571870174756082) Per User RMSE (best) +---------+-------+----------------+ | user_id | count | rmse | +---------+-------+----------------+ | 2644 | 2 | 0.319982658691 | +---------+-------+----------------+ [1 rows x 3 columns] Per User RMSE (worst) +---------+-------+---------------+ | user_id | count | rmse | +---------+-------+---------------+ | 1453 | 1 | 2.43030813721 | +---------+-------+---------------+ [1 rows x 3 columns] Per Item RMSE (best) +----------+-------+-------------------+ | movie_id | count | rmse | +----------+-------+-------------------+ | 1472 | 1 | 0.000398884009287 | +----------+-------+-------------------+ [1 rows x 3 columns] Per Item RMSE (worst) +----------+-------+---------------+ | movie_id | count | rmse | +----------+-------+---------------+ | 398 | 1 | 8.48029736682 | +----------+-------+---------------+ [1 rows x 3 columns] PROGRESS: Evaluate model M3 Precision and recall summary statistics by cutoff +--------+----------------+-----------------+ | cutoff | mean_precision | mean_recall | +--------+----------------+-----------------+ | 1 | 0.296 | 0.0108108277359 | | 2 | 0.292 | 0.0223053284436 | | 3 | 0.278666666667 | 0.0325446458663 | | 4 | 0.2615 | 0.0404275263081 | | 5 | 0.254 | 0.0482578874811 | | 6 | 0.244 | 0.0549556020468 | | 7 | 0.234857142857 | 0.0617971007626 | | 8 | 0.23225 | 0.0712002266095 | | 9 | 0.225777777778 | 0.0760029463955 | | 10 | 0.222 | 0.0825916856647 | +--------+----------------+-----------------+ [10 rows x 3 columns] ('\nOverall RMSE: ', 0.9864932997213498) Per User RMSE (best) +---------+-------+----------------+ | user_id | count | rmse | +---------+-------+----------------+ | 2356 | 3 | 0.169179486062 | +---------+-------+----------------+ [1 rows x 3 columns] Per User RMSE (worst) +---------+-------+---------------+ | user_id | count | rmse | +---------+-------+---------------+ | 5243 | 4 | 2.11510097987 | +---------+-------+---------------+ [1 rows x 3 columns] Per Item RMSE (best) +----------+-------+------------------+ | movie_id | count | rmse | +----------+-------+------------------+ | 565 | 1 | 0.00157158077202 | +----------+-------+------------------+ [1 rows x 3 columns] Per Item RMSE (worst) +----------+-------+---------------+ | movie_id | count | rmse | +----------+-------+---------------+ | 3950 | 1 | 3.44213095993 | +----------+-------+---------------+ [1 rows x 3 columns]
[results[i]['rmse_overall'] for i in range(len(results))]
[0.9912235912964674, 1.0467144960152537, 1.0571870174756082, 0.9864932997213498]
results[3]['rmse_by_item']
movie_id | count | rmse |
---|---|---|
2871 | 18 | 1.08553449768 |
2043 | 3 | 0.28091888274 |
2464 | 1 | 0.600349059629 |
232 | 5 | 1.99505541159 |
3880 | 2 | 1.81932941568 |
2238 | 2 | 0.488967829715 |
3719 | 4 | 1.49264680177 |
431 | 5 | 0.934952171152 |
2661 | 2 | 0.782811477717 |
3811 | 7 | 0.938744314991 |
graphlab.recommender.util.compare_models(test[test['rating'] > 4],
[m_rank, m_both],
user_sample=0.2,
metric='precision_recall')
compare_models: using 186 users to estimate model performance PROGRESS: Evaluate model M0 Precision and recall summary statistics by cutoff +--------+-----------------+-----------------+ | cutoff | mean_precision | mean_recall | +--------+-----------------+-----------------+ | 1 | 0.102150537634 | 0.0103584423342 | | 2 | 0.134408602151 | 0.0325592569006 | | 3 | 0.123655913978 | 0.0469152266463 | | 4 | 0.103494623656 | 0.0536329312206 | | 5 | 0.0978494623656 | 0.0668332524048 | | 6 | 0.0905017921147 | 0.0706819521975 | | 7 | 0.0867895545315 | 0.0798389988545 | | 8 | 0.0880376344086 | 0.0929486544744 | | 9 | 0.089605734767 | 0.108265520447 | | 10 | 0.0876344086022 | 0.117956009872 | +--------+-----------------+-----------------+ [10 rows x 3 columns] PROGRESS: Evaluate model M1 Precision and recall summary statistics by cutoff +--------+----------------+-----------------+ | cutoff | mean_precision | mean_recall | +--------+----------------+-----------------+ | 1 | 0.225806451613 | 0.0368621545266 | | 2 | 0.185483870968 | 0.0681228986668 | | 3 | 0.186379928315 | 0.100618452607 | | 4 | 0.173387096774 | 0.118549158313 | | 5 | 0.154838709677 | 0.131116798495 | | 6 | 0.141577060932 | 0.139856304912 | | 7 | 0.137480798771 | 0.156850192296 | | 8 | 0.134408602151 | 0.172714099339 | | 9 | 0.127837514934 | 0.180102516583 | | 10 | 0.125268817204 | 0.191717395836 | +--------+----------------+-----------------+ [10 rows x 3 columns]
[{'precision_recall_by_user': Columns: user_id int cutoff int precision float recall float count int Rows: 3348 Data: +---------+--------+----------------+----------------+-------+ | user_id | cutoff | precision | recall | count | +---------+--------+----------------+----------------+-------+ | 56 | 1 | 1.0 | 0.166666666667 | 6 | | 56 | 2 | 0.5 | 0.166666666667 | 6 | | 56 | 3 | 0.333333333333 | 0.166666666667 | 6 | | 56 | 4 | 0.25 | 0.166666666667 | 6 | | 56 | 5 | 0.4 | 0.333333333333 | 6 | | 56 | 6 | 0.333333333333 | 0.333333333333 | 6 | | 56 | 7 | 0.285714285714 | 0.333333333333 | 6 | | 56 | 8 | 0.25 | 0.333333333333 | 6 | | 56 | 9 | 0.222222222222 | 0.333333333333 | 6 | | 56 | 10 | 0.2 | 0.333333333333 | 6 | +---------+--------+----------------+----------------+-------+ [3348 rows x 5 columns] Note: Only the head of the SFrame is printed. You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns., 'precision_recall_overall': Columns: cutoff int precision float recall float Rows: 18 Data: +--------+-----------------+-----------------+ | cutoff | precision | recall | +--------+-----------------+-----------------+ | 1 | 0.102150537634 | 0.0103584423342 | | 2 | 0.134408602151 | 0.0325592569006 | | 3 | 0.123655913978 | 0.0469152266463 | | 4 | 0.103494623656 | 0.0536329312206 | | 5 | 0.0978494623656 | 0.0668332524048 | | 6 | 0.0905017921147 | 0.0706819521975 | | 7 | 0.0867895545315 | 0.0798389988545 | | 8 | 0.0880376344086 | 0.0929486544744 | | 9 | 0.089605734767 | 0.108265520447 | | 10 | 0.0876344086022 | 0.117956009872 | +--------+-----------------+-----------------+ [18 rows x 3 columns] Note: Only the head of the SFrame is printed. You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.}, {'precision_recall_by_user': Columns: user_id int cutoff int precision float recall float count int Rows: 3348 Data: +---------+--------+----------------+----------------+-------+ | user_id | cutoff | precision | recall | count | +---------+--------+----------------+----------------+-------+ | 56 | 1 | 0.0 | 0.0 | 6 | | 56 | 2 | 0.0 | 0.0 | 6 | | 56 | 3 | 0.333333333333 | 0.166666666667 | 6 | | 56 | 4 | 0.5 | 0.333333333333 | 6 | | 56 | 5 | 0.6 | 0.5 | 6 | | 56 | 6 | 0.5 | 0.5 | 6 | | 56 | 7 | 0.428571428571 | 0.5 | 6 | | 56 | 8 | 0.375 | 0.5 | 6 | | 56 | 9 | 0.333333333333 | 0.5 | 6 | | 56 | 10 | 0.3 | 0.5 | 6 | +---------+--------+----------------+----------------+-------+ [3348 rows x 5 columns] Note: Only the head of the SFrame is printed. You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns., 'precision_recall_overall': Columns: cutoff int precision float recall float Rows: 18 Data: +--------+----------------+-----------------+ | cutoff | precision | recall | +--------+----------------+-----------------+ | 1 | 0.225806451613 | 0.0368621545266 | | 2 | 0.185483870968 | 0.0681228986668 | | 3 | 0.186379928315 | 0.100618452607 | | 4 | 0.173387096774 | 0.118549158313 | | 5 | 0.154838709677 | 0.131116798495 | | 6 | 0.141577060932 | 0.139856304912 | | 7 | 0.137480798771 | 0.156850192296 | | 8 | 0.134408602151 | 0.172714099339 | | 9 | 0.127837514934 | 0.180102516583 | | 10 | 0.125268817204 | 0.191717395836 | +--------+----------------+-----------------+ [18 rows x 3 columns] Note: Only the head of the SFrame is printed. You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.}]
fm = graphlab.recommender.create(train.head(10000), 'user_id', 'movie_id', 'rating',
method='factorization_model',
item_data=item_sf,
sgd_step_size=0.09,
max_iterations=10)