使用GraphLab进行电影推荐¶

In [1]:

import graphlab
graphlab.canvas.set_target("ipynb")
# set canvas to show sframes and sgraphs in ipython notebook
# import matplotlib.pyplot as plt
# %matplotlib inline

In [7]:

# download data from: http://files.grouplens.org/datasets/movielens/ml-1m.zip

In [2]:

data = graphlab.SFrame.read_csv('/Users/datalab/bigdata/cjc/ml-1m/ratings.dat', delimiter='\n', 
                                header=False)['X1'].apply(lambda x: x.split('::')).unpack()
for col in data.column_names():
    data[col] = data[col].astype(int)
data.rename({'X.0': 'user_id', 'X.1': 'movie_id', 'X.2': 'rating', 'X.3': 'timestamp'})
data.save('ratings')

users = graphlab.SFrame.read_csv('/Users/datalab/bigdata/cjc/ml-1m/users.dat', delimiter='\n', 
                                 header=False)['X1'].apply(lambda x: x.split('::')).unpack()
users.rename({'X.0': 'user_id', 'X.1': 'gender', 'X.2': 'age', 'X.3': 'occupation', 'X.4': 'zip-code'})
users['user_id'] = users['user_id'].astype(int)
users.save('users')

items = graphlab.SFrame.read_csv('/Users/datalab/bigdata/cjc/ml-1m/movies.dat', delimiter='\n', 
                                 header=False)['X1'].apply(lambda x: x.split('::')).unpack()
items.rename({'X.0': 'movie_id', 'X.1': 'title', 'X.2': 'genre'})
items['movie_id'] = items['movie_id'].astype(int)
items.save('items')

[INFO] graphlab.cython.cy_server: GraphLab Create v2.1 started. Logging: /tmp/graphlab_server_1525501840.log

This non-commercial license of GraphLab Create for academic use is assigned to wangchengjun@nju.edu.cn and will expire on March 14, 2019.

Finished parsing file /Users/datalab/bigdata/cjc/ml-1m/ratings.dat

Parsing completed. Parsed 100 lines in 0.43242 secs.

------------------------------------------------------
Inferred types from first 100 line(s) of file as 
column_type_hints=[str]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------

Finished parsing file /Users/datalab/bigdata/cjc/ml-1m/ratings.dat

Parsing completed. Parsed 1000209 lines in 0.560575 secs.

Finished parsing file /Users/datalab/bigdata/cjc/ml-1m/users.dat

------------------------------------------------------

Parsing completed. Parsed 100 lines in 0.037251 secs.

Inferred types from first 100 line(s) of file as 
column_type_hints=[str]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------

Finished parsing file /Users/datalab/bigdata/cjc/ml-1m/users.dat

Parsing completed. Parsed 6040 lines in 0.015785 secs.

Finished parsing file /Users/datalab/bigdata/cjc/ml-1m/movies.dat

Parsing completed. Parsed 100 lines in 0.033283 secs.

------------------------------------------------------
Inferred types from first 100 line(s) of file as 
column_type_hints=[str]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------

Finished parsing file /Users/datalab/bigdata/cjc/ml-1m/movies.dat

Parsing completed. Parsed 3883 lines in 0.016475 secs.

In [3]:

data

Out[3]:

user_id	movie_id	rating	timestamp
1	1193	5	978300760
1	661	3	978302109
1	914	3	978301968
1	3408	4	978300275
1	2355	5	978824291
1	1197	3	978302268
1	1287	5	978302039
1	2804	5	978300719
1	594	4	978302268
1	919	4	978301368

[1000209 rows x 4 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

In [4]:

items

Out[4]:

movie_id	title	genre
1	Toy Story (1995)	Animation\|Children's\|Come dy ...
2	Jumanji (1995)	Adventure\|Children's\|Fant asy ...
3	Grumpier Old Men (1995)	Comedy\|Romance
4	Waiting to Exhale (1995)	Comedy\|Drama
5	Father of the Bride Part II (1995) ...	Comedy
6	Heat (1995)	Action\|Crime\|Thriller
7	Sabrina (1995)	Comedy\|Romance
8	Tom and Huck (1995)	Adventure\|Children's
9	Sudden Death (1995)	Action
10	GoldenEye (1995)	Action\|Adventure\|Thriller

[3883 rows x 3 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

In [6]:

users

Out[6]:

user_id	gender	age	occupation	zip-code
1	F	1	10	48067
2	M	56	16	70072
3	M	25	15	55117
4	M	45	7	02460
5	M	25	20	55455
6	F	50	9	55117
7	M	35	1	06810
8	M	25	12	11413
9	M	25	17	61614
10	F	35	1	95370

[6040 rows x 5 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

In [7]:

data = data.join(items, on='movie_id')

In [8]:

data

Out[8]:

user_id	movie_id	rating	timestamp	title	genre
1	1193	5	978300760	One Flew Over the Cuckoo's Nest (1975) ...	Drama
1	661	3	978302109	James and the Giant Peach (1996) ...	Animation\|Children's\|Musi cal ...
1	914	3	978301968	My Fair Lady (1964)	Musical\|Romance
1	3408	4	978300275	Erin Brockovich (2000)	Drama
1	2355	5	978824291	Bug's Life, A (1998)	Animation\|Children's\|Come dy ...
1	1197	3	978302268	Princess Bride, The (1987) ...	Action\|Adventure\|Comedy\|R omance ...
1	1287	5	978302039	Ben-Hur (1959)	Action\|Adventure\|Drama
1	2804	5	978300719	Christmas Story, A (1983)	Comedy\|Drama
1	594	4	978302268	Snow White and the Seven Dwarfs (1937) ...	Animation\|Children's\|Musi cal ...
1	919	4	978301368	Wizard of Oz, The (1939)	Adventure\|Children's\|Dram a\|Musical ...

[1000209 rows x 6 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

In [9]:

train_set, test_set = data.random_split(0.95, seed=1)

In [10]:

m = graphlab.recommender.create(train_set, 'user_id', 'movie_id', 'rating')

Recsys training: model = ranking_factorization_recommender

Preparing data set.

    Data has 949852 observations with 6040 users and 3701 items.

    Data prepared in: 1.29085s

Training ranking_factorization_recommender for recommendations.

+--------------------------------+--------------------------------------------------+----------+

| Parameter                      | Description                                      | Value    |

+--------------------------------+--------------------------------------------------+----------+

| num_factors                    | Factor Dimension                                 | 32       |

| regularization                 | L2 Regularization on Factors                     | 1e-09    |

| solver                         | Solver used for training                         | adagrad  |

| linear_regularization          | L2 Regularization on Linear Coefficients         | 1e-09    |

| ranking_regularization         | Rank-based Regularization Weight                 | 0.25     |

| max_iterations                 | Maximum Number of Iterations                     | 25       |

+--------------------------------+--------------------------------------------------+----------+

  Optimizing model using SGD; tuning step size.

  Using 118731 / 949852 points for tuning the step size.

+---------+-------------------+------------------------------------------+

| Attempt | Initial Step Size | Estimated Objective Value                |

+---------+-------------------+------------------------------------------+

| 0       | 10                | Not Viable                               |

| 1       | 2.5               | Not Viable                               |

| 2       | 0.625             | Not Viable                               |

| 3       | 0.15625           | Not Viable                               |

| 4       | 0.0390625         | 0.682141                                 |

| 5       | 0.0195312         | 1.16279                                  |

| 6       | 0.00976562        | 1.50932                                  |

+---------+-------------------+------------------------------------------+

| Final   | 0.0390625         | 0.682141                                 |

+---------+-------------------+------------------------------------------+

Starting Optimization.

+---------+--------------+-------------------+-----------------------+-------------+

| Iter.   | Elapsed Time | Approx. Objective | Approx. Training RMSE | Step Size   |

+---------+--------------+-------------------+-----------------------+-------------+

| Initial | 74us         | 2.44717           | 1.11719               |             |

+---------+--------------+-------------------+-----------------------+-------------+

| 1       | 1.16s        | 1.31004           | 0.991321              | 0.0390625   |

| 2       | 2.24s        | 0.983868          | 0.906376              | 0.0390625   |

| 3       | 3.32s        | 0.87771           | 0.869764              | 0.0390625   |

| 4       | 4.41s        | 0.826156          | 0.852127              | 0.0390625   |

| 5       | 5.54s        | 0.793964          | 0.840965              | 0.0390625   |

| 6       | 6.60s        | 0.770374          | 0.832034              | 0.0390625   |

| 7       | 7.72s        | 0.751784          | 0.825083              | 0.0390625   |

| 8       | 8.76s        | 0.737281          | 0.819622              | 0.0390625   |

| 9       | 9.89s        | 0.725296          | 0.814701              | 0.0390625   |

| 10      | 10.93s       | 0.714175          | 0.810177              | 0.0390625   |

| 11      | 12.08s       | 0.70574           | 0.806509              | 0.0390625   |

| 12      | 13.11s       | 0.697265          | 0.80316               | 0.0390625   |

| 13      | 14.34s       | 0.689758          | 0.799906              | 0.0390625   |

| 14      | 15.37s       | 0.683825          | 0.797263              | 0.0390625   |

| 15      | 16.49s       | 0.678044          | 0.794862              | 0.0390625   |

| 16      | 17.68s       | 0.67252           | 0.792279              | 0.0390625   |

| 17      | 18.77s       | 0.667249          | 0.790149              | 0.0390625   |

| 18      | 19.82s       | 0.662575          | 0.788064              | 0.0390625   |

| 19      | 20.83s       | 0.658308          | 0.786007              | 0.0390625   |

| 20      | 21.87s       | 0.654573          | 0.78429               | 0.0390625   |

| 21      | 22.89s       | 0.650925          | 0.782561              | 0.0390625   |

| 22      | 24.06s       | 0.647671          | 0.781114              | 0.0390625   |

| 23      | 25.31s       | 0.644385          | 0.779579              | 0.0390625   |

| 24      | 26.59s       | 0.641323          | 0.778085              | 0.0390625   |

| 25      | 27.79s       | 0.638167          | 0.776486              | 0.0390625   |

+---------+--------------+-------------------+-----------------------+-------------+

Optimization Complete: Maximum number of passes through the data reached.

Computing final objective value and training RMSE.

       Final objective value: 0.625427

       Final training RMSE: 0.768104

In [11]:

Out[11]:

Class                            : RankingFactorizationRecommender

Schema
------
User ID                          : user_id
Item ID                          : movie_id
Target                           : rating
Additional observation features  : 3
User side features               : []
Item side features               : []

Statistics
----------
Number of observations           : 949852
Number of users                  : 6040
Number of items                  : 3701

Training summary
----------------
Training time                    : 33.7974

Model Parameters
----------------
Model class                      : RankingFactorizationRecommender
num_factors                      : 32
binary_target                    : 0
side_data_factorization          : 1
solver                           : auto
nmf                              : 0
max_iterations                   : 25

Regularization Settings
-----------------------
regularization                   : 0.0
regularization_type              : normal
linear_regularization            : 0.0
ranking_regularization           : 0.25
unobserved_rating_value          : -1.79769313486e+308
num_sampled_negative_examples    : 4
ials_confidence_scaling_type     : auto
ials_confidence_scaling_factor   : 1

Optimization Settings
---------------------
init_random_sigma                : 0.01
sgd_convergence_interval         : 4
sgd_convergence_threshold        : 0.0
sgd_max_trial_iterations         : 5
sgd_sampling_block_size          : 131072
sgd_step_adjustment_interval     : 4
sgd_step_size                    : 0.0
sgd_trial_sample_minimum_size    : 10000
sgd_trial_sample_proportion      : 0.125
step_size_decrease_rate          : 0.75
additional_iterations_if_unhealthy : 5
adagrad_momentum_weighting       : 0.9
num_tempering_iterations         : 4
tempering_regularization_start_value : 0.0
track_exact_loss                 : 0

In [12]:

m2 = graphlab.item_similarity_recommender.create(train_set, 
                                                 'user_id', 'movie_id', 'rating',
                                 similarity_type='pearson')

Recsys training: model = item_similarity

Warning: Ignoring columns timestamp, title, genre;

    To use these columns in scoring predictions, use a model that allows the use of additional features.

Preparing data set.

    Data has 949852 observations with 6040 users and 3701 items.

    Data prepared in: 0.690296s

Training model from provided data.

Gathering per-item and per-user statistics.

+--------------------------------+------------+

| Elapsed Time (Item Statistics) | % Complete |

+--------------------------------+------------+

| 4.296ms                        | 16.5       |

| 55.516ms                       | 100        |

+--------------------------------+------------+

Setting up lookup tables.

Processing data in one pass using dense lookup tables.

+-------------------------------------+------------------+-----------------+

| Elapsed Time (Constructing Lookups) | Total % Complete | Items Processed |

+-------------------------------------+------------------+-----------------+

| 85.885ms                            | 0                | 0               |

| 1.09s                               | 35               | 1300            |

| 2.09s                               | 63.5             | 2350            |

| 3.09s                               | 99.5             | 3689            |

| 3.17s                               | 100              | 3701            |

+-------------------------------------+------------------+-----------------+

Finalizing lookup tables.

Generating candidate set for working with new users.

Finished training in 3.21611s

In [13]:

m2

Out[13]:

Class                            : ItemSimilarityRecommender

Schema
------
User ID                          : user_id
Item ID                          : movie_id
Target                           : rating
Additional observation features  : 0
User side features               : []
Item side features               : []

Statistics
----------
Number of observations           : 949852
Number of users                  : 6040
Number of items                  : 3701

Training summary
----------------
Training time                    : 3.2161

Model Parameters
----------------
Model class                      : ItemSimilarityRecommender
threshold                        : 0.001
similarity_type                  : pearson
training_method                  : auto

Other Settings
--------------
degree_approximation_threshold   : 4096
sparse_density_estimation_sample_size : 4096
max_data_passes                  : 4096
target_memory_usage              : 8589934592
seed_item_set_size               : 50
nearest_neighbors_interaction_proportion_threshold : 0.05
max_item_neighborhood_size       : 64

In [14]:

result = graphlab.recommender.util.compare_models(test_set, 
                                                  [m, m2],
                                            user_sample=.5, skip_set=train_set)

compare_models: using 2811 users to estimate model performance
PROGRESS: Evaluate model M0

recommendations finished on 1000/2811 queries. users per second: 7788.34

recommendations finished on 2000/2811 queries. users per second: 8209.17

Precision and recall summary statistics by cutoff
+--------+-----------------+-----------------+
| cutoff |  mean_precision |   mean_recall   |
+--------+-----------------+-----------------+
|   1    | 0.0647456421202 | 0.0055003280382 |
|   2    | 0.0585200996087 | 0.0113353035287 |
|   3    | 0.0524131388592 | 0.0159704348141 |
|   4    | 0.0489149768766 | 0.0202725808447 |
|   5    | 0.0452508004269 | 0.0240583282714 |
|   6    | 0.0432230522946 | 0.0281284957357 |
|   7    | 0.0413680947299 | 0.0315008447446 |
|   8    |  0.03962113127  | 0.0342475240871 |
|   9    | 0.0383414364204 | 0.0373489084669 |
|   10   | 0.0367484880825 | 0.0397919699688 |
+--------+-----------------+-----------------+
[10 rows x 3 columns]

('\nOverall RMSE: ', 0.908462581230655)

Per User RMSE (best)
+---------+-------+------------------+
| user_id | count |       rmse       |
+---------+-------+------------------+
|   1614  |   1   | 0.00313141188165 |
+---------+-------+------------------+
[1 rows x 3 columns]


Per User RMSE (worst)
+---------+-------+---------------+
| user_id | count |      rmse     |
+---------+-------+---------------+
|   4936  |   1   | 4.31876092511 |
+---------+-------+---------------+
[1 rows x 3 columns]


Per Item RMSE (best)
+----------+-------+------------------+
| movie_id | count |       rmse       |
+----------+-------+------------------+
|   3138   |   1   | 0.00103085925747 |
+----------+-------+------------------+
[1 rows x 3 columns]


Per Item RMSE (worst)
+----------+-------+---------------+
| movie_id | count |      rmse     |
+----------+-------+---------------+
|   1455   |   1   | 4.06725567438 |
+----------+-------+---------------+
[1 rows x 3 columns]

PROGRESS: Evaluate model M1

recommendations finished on 1000/2811 queries. users per second: 21419.2

recommendations finished on 2000/2811 queries. users per second: 22703.3

Precision and recall summary statistics by cutoff
+--------+-------------------+-------------------+
| cutoff |   mean_precision  |    mean_recall    |
+--------+-------------------+-------------------+
|   1    |        0.0        |        0.0        |
|   2    |        0.0        |        0.0        |
|   3    |        0.0        |        0.0        |
|   4    | 8.89363215937e-05 | 5.08207551964e-05 |
|   5    |  7.1149057275e-05 | 5.08207551964e-05 |
|   6    | 5.92908810625e-05 | 5.08207551964e-05 |
|   7    | 5.08207551964e-05 | 5.08207551964e-05 |
|   8    | 4.44681607969e-05 | 5.08207551964e-05 |
|   9    | 3.95272540417e-05 | 5.08207551964e-05 |
|   10   | 3.55745286375e-05 | 5.08207551964e-05 |
+--------+-------------------+-------------------+
[10 rows x 3 columns]

('\nOverall RMSE: ', 0.9853701447799256)

Per User RMSE (best)
+---------+-------+------------------+
| user_id | count |       rmse       |
+---------+-------+------------------+
|   5821  |   1   | 0.00493980551612 |
+---------+-------+------------------+
[1 rows x 3 columns]


Per User RMSE (worst)
+---------+-------+---------------+
| user_id | count |      rmse     |
+---------+-------+---------------+
|   5214  |   2   | 3.28453141022 |
+---------+-------+---------------+
[1 rows x 3 columns]


Per Item RMSE (best)
+----------+-------+------+
| movie_id | count | rmse |
+----------+-------+------+
|   977    |   1   | 0.0  |
+----------+-------+------+
[1 rows x 3 columns]


Per Item RMSE (worst)
+----------+-------+------+
| movie_id | count | rmse |
+----------+-------+------+
|   572    |   1   | 4.0  |
+----------+-------+------+
[1 rows x 3 columns]

Getting similar items¶

In [15]:

m.get_similar_items([1287])  # movie_id is Ben-Hur

Out[15]:

movie_id	similar	score	rank
1287	940	0.62611323595	1
1287	1291	0.578746318817	2
1287	2370	0.560160756111	3
1287	3805	0.524255812168	4
1287	2905	0.523368954659	5
1287	2324	0.519006967545	6
1287	2804	0.510964155197	7
1287	1214	0.496943891048	8
1287	919	0.491189420223	9
1287	1198	0.490789085627	10

[10 rows x 4 columns]

In [19]:

help(m.get_similar_items)

Help on method get_similar_items in module graphlab.toolkits.recommender.util:

get_similar_items(self, items=None, k=10, verbose=False) method of graphlab.toolkits.recommender.ranking_factorization_recommender.RankingFactorizationRecommender instance
    Get the k most similar items for each item in items.
    
    Each type of recommender has its own model for the similarity
    between items. For example, the item_similarity_recommender will
    return the most similar items according to the user-chosen
    similarity; the factorization_recommender will return the
    nearest items based on the cosine similarity between latent item
    factors.
    
    Parameters
    ----------
    items : SArray or list; optional
        An :class:`~graphlab.SArray` or list of item ids for which to get
        similar items. If 'None', then return the `k` most similar items for
        all items in the training set.
    
    k : int, optional
        The number of similar items for each item.
    
    verbose : bool, optional
        Progress printing is shown.
    
    Returns
    -------
    out : SFrame
        A SFrame with the top ranked similar items for each item. The
        columns `item`, 'similar', 'score' and 'rank', where
        `item` matches the item column name specified at training time.
        The 'rank' is between 1 and `k` and 'score' gives the similarity
        score of that item. The value of the score depends on the method
        used for computing item similarities.
    
    Examples
    --------
    
    >>> sf = graphlab.SFrame({'user_id': ["0", "0", "0", "1", "1", "2", "2", "2"],
                              'item_id': ["a", "b", "c", "a", "b", "b", "c", "d"]})
    >>> m = graphlab.item_similarity_recommender.create(sf)
    >>> nn = m.get_similar_items()

'score' gives the similarity score of that item

In [16]:

m.get_similar_items([1287]).join(items, on={'similar': 'movie_id'}).sort('rank')

Out[16]:

movie_id	similar	score	rank	title	genre
1287	940	0.62611323595	1	Adventures of Robin Hood, The (1938) ...	Action\|Adventure
1287	1291	0.578746318817	2	Indiana Jones and the Last Crusade (1989) ...	Action\|Adventure
1287	2370	0.560160756111	3	Emerald Forest, The (1985) ...	Action\|Adventure\|Drama
1287	3805	0.524255812168	4	Knightriders (1981)	Action\|Adventure\|Drama
1287	2905	0.523368954659	5	Sanjuro (1962)	Action\|Adventure
1287	2324	0.519006967545	6	Life Is Beautiful (La Vita � bella) (1997) ...	Comedy\|Drama
1287	2804	0.510964155197	7	Christmas Story, A (1983)	Comedy\|Drama
1287	1214	0.496943891048	8	Alien (1979)	Action\|Horror\|Sci- Fi\|Thriller ...
1287	919	0.491189420223	9	Wizard of Oz, The (1939)	Adventure\|Children's\|Dram a\|Musical ...
1287	1198	0.490789085627	10	Raiders of the Lost Ark (1981) ...	Action\|Adventure

[10 rows x 6 columns]

Making recommendations¶

In [20]:

recs = m.recommend()

recommendations finished on 1000/6040 queries. users per second: 7665.01

recommendations finished on 2000/6040 queries. users per second: 7859.16

recommendations finished on 3000/6040 queries. users per second: 7905.28

recommendations finished on 4000/6040 queries. users per second: 8137.29

recommendations finished on 5000/6040 queries. users per second: 8219.12

recommendations finished on 6000/6040 queries. users per second: 7955.95

In [21]:

recs

Out[21]:

user_id	movie_id	score	rank
1	34	3.9719229951	1
1	590	3.92724250028	2
1	1198	3.91728875557	3
1	1282	3.91183034102	4
1	1682	3.90261084358	5
1	356	3.89610960106	6
1	1408	3.88136133323	7
1	1210	3.87818231205	8
1	912	3.87265440965	9
1	1393	3.87153578973	10

[60400 rows x 4 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

In [24]:

data[data['user_id'] == 4]

Out[24]:

user_id	movie_id	rating	timestamp	title	genre
4	3468	5	978294008	Hustler, The (1961)	Drama
4	1210	3	978293924	Star Wars: Episode VI - Return of the Jedi (1 ...	Action\|Adventure\|Romance \|Sci-Fi\|War ...
4	2951	4	978294282	Fistful of Dollars, A (1964) ...	Action\|Western
4	1214	4	978294260	Alien (1979)	Action\|Horror\|Sci- Fi\|Thriller ...
4	1036	4	978294282	Die Hard (1988)	Action\|Thriller
4	260	5	978294199	Star Wars: Episode IV - A New Hope (1977) ...	Action\|Adventure\|Fantasy \|Sci-Fi ...
4	2028	5	978294230	Saving Private Ryan (1998) ...	Action\|Drama\|War
4	480	4	978294008	Jurassic Park (1993)	Action\|Adventure\|Sci-Fi
4	1196	2	978294199	Star Wars: Episode V - The Empire Strikes Back ...	Action\|Adventure\|Drama \|Sci-Fi\|War ...
4	1198	5	978294199	Raiders of the Lost Ark (1981) ...	Action\|Adventure

[? rows x 6 columns]
Note: Only the head of the SFrame is printed. This SFrame is lazily evaluated.
You can use sf.materialize() to force materialization.

In [25]:

m.recommend(users=[4], k=20).join(items, on='movie_id')

Out[25]:

user_id	movie_id	score	rank	title	genre
4	527	4.10534794144	7	Schindler's List (1993)	Drama\|War
4	541	4.07768926285	9	Blade Runner (1982)	Film-Noir\|Sci-Fi
4	745	4.02218672428	17	Close Shave, A (1995)	Animation\|Comedy\|Thriller
4	750	4.31101408861	1	Dr. Strangelove or: How I Learned to Stop Worrying ...	Sci-Fi\|War
4	924	4.14404563866	5	2001: A Space Odyssey (1968) ...	Drama\|Mystery\|Sci- Fi\|Thriller ...
4	1073	4.0288983833	15	Willy Wonka and the Chocolate Factory (1971) ...	Adventure\|Children's\|Come dy\|Fantasy ...
4	1084	4.01916591338	19	Bonnie and Clyde (1967)	Crime\|Drama
4	1094	4.07409318052	10	Crying Game, The (1992)	Drama\|Romance\|War
4	1183	4.01506134383	20	English Patient, The (1996) ...	Drama\|Romance\|War
4	1206	4.08579082442	8	Clockwork Orange, A (1971) ...	Sci-Fi

[20 rows x 6 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

In [26]:

m.recommend?

Recommendations for new users¶

In [51]:

recent_data = graphlab.SFrame()
recent_data['movie_id'] = [30, 1000, 900, 883, 251, 200, 199, 180, 120, 991, 1212] 
recent_data['user_id'] = 99999
recent_data['rating'] = [2, 1, 3, 4, 0, 0, 1, 1, 1, 2, 3]
recent_data

Out[51]:

movie_id	user_id	rating
30	99999	2
1000	99999	1
900	99999	3
883	99999	4
251	99999	0
200	99999	0
199	99999	1
180	99999	1
120	99999	1
991	99999	2

[11 rows x 3 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

In [53]:

m2.recommend(users=[99999], new_observation_data=recent_data).join(items, on='movie_id').sort('rank')

Out[53]:

user_id	movie_id	score	rank	title	genre
99999	572	5.0	1	Foreign Student (1994)	Drama
99999	3881	5.0	2	Bittersweet Motel (2000)	Documentary
99999	1830	5.0	3	Follow the Bitch (1998)	Comedy
99999	989	5.0	4	Schlafes Bruder (Brother of Sleep) (1995) ...	Drama
99999	3172	5.0	5	Ulysses (Ulisse) (1954)	Adventure
99999	3233	5.0	6	Smashing Time (1967)	Comedy
99999	3382	5.0	7	Song of Freedom (1936)	Drama
99999	787	5.0	8	Gate of Heavenly Peace, The (1995) ...	Documentary
99999	3656	5.0	9	Lured (1947)	Crime
99999	3280	5.0	10	Baby, The (1973)	Horror

[10 rows x 6 columns]

Saving and loading models¶

In [ ]:

m.save('my_model')

In [ ]:

m_again = graphlab.load_model('my_model')

In [ ]:

m_again