import turicreate as tc
#train_file = 'http://s3.amazonaws.com/dato-datasets/millionsong/10000.txt'
train_file = '/Users/datalab/bigdata/cjc/millionsong/song_usage_10000.txt'
sf = tc.SFrame.read_csv(train_file, header=False, delimiter='\t', verbose=False)
sf = sf.rename({'X1':'user_id', 'X2':'music_id', 'X3':'rating'})
train_set, test_set = sf.random_split(0.8, seed=1)
popularity_model = tc.popularity_recommender.create(train_set,
'user_id', 'music_id',
target = 'rating')
Preparing data set.
Data has 1599753 observations with 76085 users and 10000 items.
Data prepared in: 0.907738s
1599753 observations to process; with 10000 unique items.
item_sim_model = tc.item_similarity_recommender.create(train_set,
'user_id', 'music_id',
target = 'rating',
similarity_type='cosine')
Preparing data set.
Data has 1599753 observations with 76085 users and 10000 items.
Data prepared in: 0.939059s
Training model from provided data.
Gathering per-item and per-user statistics.
+--------------------------------+------------+
| Elapsed Time (Item Statistics) | % Complete |
+--------------------------------+------------+
| 1.878ms | 1.25 |
| 29.154ms | 100 |
+--------------------------------+------------+
Setting up lookup tables.
Processing data in one pass using dense lookup tables.
+-------------------------------------+------------------+-----------------+
| Elapsed Time (Constructing Lookups) | Total % Complete | Items Processed |
+-------------------------------------+------------------+-----------------+
| 199.684ms | 0 | 0 |
| 957.28ms | 100 | 10000 |
+-------------------------------------+------------------+-----------------+
Finalizing lookup tables.
Generating candidate set for working with new users.
Finished training in 2.00431s
factorization_machine_model = tc.recommender.factorization_recommender.create(train_set,
'user_id', 'music_id',
target='rating')
Preparing data set.
Data has 1599753 observations with 76085 users and 10000 items.
Data prepared in: 0.930001s
Training factorization_recommender for recommendations.
+--------------------------------+--------------------------------------------------+----------+
| Parameter | Description | Value |
+--------------------------------+--------------------------------------------------+----------+
| num_factors | Factor Dimension | 8 |
| regularization | L2 Regularization on Factors | 1e-08 |
| solver | Solver used for training | sgd |
| linear_regularization | L2 Regularization on Linear Coefficients | 1e-10 |
| max_iterations | Maximum Number of Iterations | 50 |
+--------------------------------+--------------------------------------------------+----------+
Optimizing model using SGD; tuning step size.
Using 199969 / 1599753 points for tuning the step size.
+---------+-------------------+------------------------------------------+
| Attempt | Initial Step Size | Estimated Objective Value |
+---------+-------------------+------------------------------------------+
| 0 | 25 | No Decrease (224.847 >= 36.2873) |
| 1 | 6.25 | No Decrease (211.831 >= 36.2873) |
| 2 | 1.5625 | No Decrease (184.589 >= 36.2873) |
| 3 | 0.390625 | No Decrease (83.9764 >= 36.2873) |
| 4 | 0.0976562 | 11.3523 |
| 5 | 0.0488281 | 7.5686 |
| 6 | 0.0244141 | 21.6581 |
+---------+-------------------+------------------------------------------+
| Final | 0.0488281 | 7.5686 |
+---------+-------------------+------------------------------------------+
Starting Optimization.
+---------+--------------+-------------------+-----------------------+-------------+
| Iter. | Elapsed Time | Approx. Objective | Approx. Training RMSE | Step Size |
+---------+--------------+-------------------+-----------------------+-------------+
| Initial | 99us | 43.795 | 6.61778 | |
+---------+--------------+-------------------+-----------------------+-------------+
| 1 | 99.622ms | 43.5086 | 6.59571 | 0.0488281 |
| 2 | 191.248ms | 40.9101 | 6.39574 | 0.0290334 |
| 3 | 280.477ms | 37.8972 | 6.15571 | 0.0214205 |
| 4 | 378.603ms | 35.2936 | 5.94045 | 0.0172633 |
| 5 | 474.372ms | 32.7773 | 5.72471 | 0.014603 |
| 10 | 959.686ms | 24.5984 | 4.95903 | 0.008683 |
| 50 | 5.02s | 9.19885 | 3.0314 | 0.00154408 |
+---------+--------------+-------------------+-----------------------+-------------+
Optimization Complete: Maximum number of passes through the data reached.
Computing final objective value and training RMSE.
Final objective value: 8.16243
Final training RMSE: 2.85534
len(train_set)
1599753
result = tc.recommender.util.compare_models(test_set,
[popularity_model, item_sim_model, factorization_machine_model],
user_sample=.5, skip_set=train_set)
compare_models: using 34354 users to estimate model performance PROGRESS: Evaluate model M0
recommendations finished on 1000/34354 queries. users per second: 19077.4
recommendations finished on 2000/34354 queries. users per second: 21071.3
recommendations finished on 3000/34354 queries. users per second: 21793
recommendations finished on 4000/34354 queries. users per second: 22081.5
recommendations finished on 5000/34354 queries. users per second: 22392.1
recommendations finished on 6000/34354 queries. users per second: 22620.3
recommendations finished on 7000/34354 queries. users per second: 22719.2
recommendations finished on 8000/34354 queries. users per second: 22900.3
recommendations finished on 9000/34354 queries. users per second: 23067.3
recommendations finished on 10000/34354 queries. users per second: 22887.2
recommendations finished on 11000/34354 queries. users per second: 22713
recommendations finished on 12000/34354 queries. users per second: 22595.1
recommendations finished on 13000/34354 queries. users per second: 22631.4
recommendations finished on 14000/34354 queries. users per second: 22749.6
recommendations finished on 15000/34354 queries. users per second: 22609.7
recommendations finished on 16000/34354 queries. users per second: 22638.4
recommendations finished on 17000/34354 queries. users per second: 22764.3
recommendations finished on 18000/34354 queries. users per second: 22809.6
recommendations finished on 19000/34354 queries. users per second: 22919.8
recommendations finished on 20000/34354 queries. users per second: 22935.8
recommendations finished on 21000/34354 queries. users per second: 22884.6
recommendations finished on 22000/34354 queries. users per second: 22859.4
recommendations finished on 23000/34354 queries. users per second: 22748.8
recommendations finished on 24000/34354 queries. users per second: 22678.2
recommendations finished on 25000/34354 queries. users per second: 22568.9
recommendations finished on 26000/34354 queries. users per second: 22427.1
recommendations finished on 27000/34354 queries. users per second: 22358.6
recommendations finished on 28000/34354 queries. users per second: 22262.1
recommendations finished on 29000/34354 queries. users per second: 22079
recommendations finished on 30000/34354 queries. users per second: 22018.1
recommendations finished on 31000/34354 queries. users per second: 21629.5
recommendations finished on 32000/34354 queries. users per second: 21548.6
recommendations finished on 33000/34354 queries. users per second: 21540.2
recommendations finished on 34000/34354 queries. users per second: 21499.5
Precision and recall summary statistics by cutoff +--------+------------------------+------------------------+ | cutoff | mean_recall | mean_precision | +--------+------------------------+------------------------+ | 1 | 4.3383582709425655e-05 | 0.00032019561040927034 | | 2 | 9.351370595195784e-05 | 0.0003493043022646556 | | 3 | 0.00013332475867271565 | 0.00032989850769439354 | | 4 | 0.00025157207340778813 | 0.0003638586481923528 | | 5 | 0.0003743018484379922 | 0.00043663037783082004 | | 6 | 0.00044921257061878193 | 0.00042207603190312675 | | 7 | 0.0005172658393736786 | 0.00041583845507697374 | | 8 | 0.000573334927644493 | 0.0004038830994935098 | | 9 | 0.0008801308762376106 | 0.0005304250515870728 | | 10 | 0.0009133251742113642 | 0.0005035803690982172 | +--------+------------------------+------------------------+ [10 rows x 3 columns] Overall RMSE: 6.339345574168611 Per User RMSE (best) +-------------------------------+------+-------+ | user_id | rmse | count | +-------------------------------+------+-------+ | 6d61c9b3678aa6c015ea9fd502... | 0.0 | 1 | +-------------------------------+------+-------+ [1 rows x 3 columns] Per User RMSE (worst) +-------------------------------+-------------------+-------+ | user_id | rmse | count | +-------------------------------+-------------------+-------+ | 38767872c514c1b43bab5c7b21... | 341.2071760874715 | 2 | +-------------------------------+-------------------+-------+ [1 rows x 3 columns] Per Item RMSE (best) +--------------------+---------------------+-------+ | music_id | rmse | count | +--------------------+---------------------+-------+ | SOXDPFW12A81C2319B | 0.07352941176470584 | 6 | +--------------------+---------------------+-------+ [1 rows x 3 columns] Per Item RMSE (worst) +--------------------+--------------------+-------+ | music_id | rmse | count | +--------------------+--------------------+-------+ | SOLGIWB12A58A77A05 | 109.15045476689721 | 35 | +--------------------+--------------------+-------+ [1 rows x 3 columns] PROGRESS: Evaluate model M1
recommendations finished on 1000/34354 queries. users per second: 18663.7
recommendations finished on 2000/34354 queries. users per second: 21804.3
recommendations finished on 3000/34354 queries. users per second: 22501.7
recommendations finished on 4000/34354 queries. users per second: 22946.8
recommendations finished on 5000/34354 queries. users per second: 23156.1
recommendations finished on 6000/34354 queries. users per second: 23361.5
recommendations finished on 7000/34354 queries. users per second: 23118
recommendations finished on 8000/34354 queries. users per second: 23054.1
recommendations finished on 9000/34354 queries. users per second: 22947
recommendations finished on 10000/34354 queries. users per second: 22859.2
recommendations finished on 11000/34354 queries. users per second: 22138.5
recommendations finished on 12000/34354 queries. users per second: 22029.6
recommendations finished on 13000/34354 queries. users per second: 22098.2
recommendations finished on 14000/34354 queries. users per second: 22204
recommendations finished on 15000/34354 queries. users per second: 22239.7
recommendations finished on 16000/34354 queries. users per second: 22323
recommendations finished on 17000/34354 queries. users per second: 22380.5
recommendations finished on 18000/34354 queries. users per second: 22421.7
recommendations finished on 19000/34354 queries. users per second: 22465.5
recommendations finished on 20000/34354 queries. users per second: 22521.1
recommendations finished on 21000/34354 queries. users per second: 22402.5
recommendations finished on 22000/34354 queries. users per second: 22338.5
recommendations finished on 23000/34354 queries. users per second: 21869.3
recommendations finished on 24000/34354 queries. users per second: 21574.6
recommendations finished on 25000/34354 queries. users per second: 21291.7
recommendations finished on 26000/34354 queries. users per second: 20909
recommendations finished on 27000/34354 queries. users per second: 20723.1
recommendations finished on 28000/34354 queries. users per second: 20666.2
recommendations finished on 29000/34354 queries. users per second: 20559.6
recommendations finished on 30000/34354 queries. users per second: 20340.9
recommendations finished on 31000/34354 queries. users per second: 20157.3
recommendations finished on 32000/34354 queries. users per second: 19851.3
recommendations finished on 33000/34354 queries. users per second: 19819
recommendations finished on 34000/34354 queries. users per second: 19781
Precision and recall summary statistics by cutoff +--------+----------------------+----------------------+ | cutoff | mean_recall | mean_precision | +--------+----------------------+----------------------+ | 1 | 0.014688842060795695 | 0.050445362985387564 | | 2 | 0.03291712354999962 | 0.06248180706759065 | | 3 | 0.05399142375515901 | 0.07436300479323134 | | 4 | 0.06963924374087777 | 0.07609012050998376 | | 5 | 0.08384375244888208 | 0.07552541188798938 | | 6 | 0.0959100718474003 | 0.07364499039413211 | | 7 | 0.10643883706242792 | 0.07136619566031029 | | 8 | 0.11604275927620446 | 0.06902398556208861 | | 9 | 0.1246532903037569 | 0.06660392126422279 | | 10 | 0.13280279585995652 | 0.06440006986086082 | +--------+----------------------+----------------------+ [10 rows x 3 columns] Overall RMSE: 7.041096333660663 Per User RMSE (best) +-------------------------------+----------------------+-------+ | user_id | rmse | count | +-------------------------------+----------------------+-------+ | 714c577dfeaa6ffaf778286302... | 0.022276699542999268 | 1 | +-------------------------------+----------------------+-------+ [1 rows x 3 columns] Per User RMSE (worst) +-------------------------------+--------------------+-------+ | user_id | rmse | count | +-------------------------------+--------------------+-------+ | 38767872c514c1b43bab5c7b21... | 343.85311015265717 | 2 | +-------------------------------+--------------------+-------+ [1 rows x 3 columns] Per Item RMSE (best) +--------------------+--------------------+-------+ | music_id | rmse | count | +--------------------+--------------------+-------+ | SOBHIJM12AB018194F | 0.7032955339939242 | 6 | +--------------------+--------------------+-------+ [1 rows x 3 columns] Per Item RMSE (worst) +--------------------+--------------------+-------+ | music_id | rmse | count | +--------------------+--------------------+-------+ | SOLGIWB12A58A77A05 | 109.83254923449184 | 35 | +--------------------+--------------------+-------+ [1 rows x 3 columns] PROGRESS: Evaluate model M2
recommendations finished on 1000/34354 queries. users per second: 14762.3
recommendations finished on 2000/34354 queries. users per second: 16166.7
recommendations finished on 3000/34354 queries. users per second: 16291.6
recommendations finished on 4000/34354 queries. users per second: 16418.9
recommendations finished on 5000/34354 queries. users per second: 16427.5
recommendations finished on 6000/34354 queries. users per second: 16329.2
recommendations finished on 7000/34354 queries. users per second: 16405.7
recommendations finished on 8000/34354 queries. users per second: 16508
recommendations finished on 9000/34354 queries. users per second: 16428.6
recommendations finished on 10000/34354 queries. users per second: 16403.8
recommendations finished on 11000/34354 queries. users per second: 16274.9
recommendations finished on 12000/34354 queries. users per second: 16310.9
recommendations finished on 13000/34354 queries. users per second: 16293.5
recommendations finished on 14000/34354 queries. users per second: 16244.3
recommendations finished on 15000/34354 queries. users per second: 16212
recommendations finished on 16000/34354 queries. users per second: 16201.7
recommendations finished on 17000/34354 queries. users per second: 16228.6
recommendations finished on 18000/34354 queries. users per second: 16249.9
recommendations finished on 19000/34354 queries. users per second: 16250.9
recommendations finished on 20000/34354 queries. users per second: 16246
recommendations finished on 21000/34354 queries. users per second: 16212.5
recommendations finished on 22000/34354 queries. users per second: 16218.8
recommendations finished on 23000/34354 queries. users per second: 16218.1
recommendations finished on 24000/34354 queries. users per second: 16230.1
recommendations finished on 25000/34354 queries. users per second: 16172.3
recommendations finished on 26000/34354 queries. users per second: 16175.4
recommendations finished on 27000/34354 queries. users per second: 16188.5
recommendations finished on 28000/34354 queries. users per second: 16199.1
recommendations finished on 29000/34354 queries. users per second: 16199.1
recommendations finished on 30000/34354 queries. users per second: 16217
recommendations finished on 31000/34354 queries. users per second: 16228.5
recommendations finished on 32000/34354 queries. users per second: 16176.8
recommendations finished on 33000/34354 queries. users per second: 16184.1
recommendations finished on 34000/34354 queries. users per second: 16170.4
Precision and recall summary statistics by cutoff +--------+------------------------+------------------------+ | cutoff | mean_recall | mean_precision | +--------+------------------------+------------------------+ | 1 | 7.0023278723497e-05 | 0.0004075216859754317 | | 2 | 0.00015701647350617692 | 0.0004220760319031288 | | 3 | 0.00025150526145133087 | 0.00041722458326056295 | | 4 | 0.00034565751961335053 | 0.00043663037783081874 | | 5 | 0.00046127750964837765 | 0.0004424521162018989 | | 6 | 0.0006061722786786824 | 0.0004899963128990292 | | 7 | 0.00070561257944388 | 0.0005197980688462193 | | 8 | 0.0008535975421092137 | 0.0005457879722885229 | | 9 | 0.0009773205690967743 | 0.0005433622479672466 | | 10 | 0.0011201566127395915 | 0.0005647086219945271 | +--------+------------------------+------------------------+ [10 rows x 3 columns] Overall RMSE: 7.780049286375036 Per User RMSE (best) +-------------------------------+------------------------+-------+ | user_id | rmse | count | +-------------------------------+------------------------+-------+ | 80495441caacdc7e069b441047... | 5.1963096963536515e-05 | 1 | +-------------------------------+------------------------+-------+ [1 rows x 3 columns] Per User RMSE (worst) +-------------------------------+--------------------+-------+ | user_id | rmse | count | +-------------------------------+--------------------+-------+ | 38767872c514c1b43bab5c7b21... | 338.99132097019395 | 2 | +-------------------------------+--------------------+-------+ [1 rows x 3 columns] Per Item RMSE (best) +--------------------+---------------------+-------+ | music_id | rmse | count | +--------------------+---------------------+-------+ | SOCOZST12A67020452 | 0.06951240175355755 | 1 | +--------------------+---------------------+-------+ [1 rows x 3 columns] Per Item RMSE (worst) +--------------------+--------------------+-------+ | music_id | rmse | count | +--------------------+--------------------+-------+ | SOVQSQZ12A8C13F960 | 113.92735214599614 | 17 | +--------------------+--------------------+-------+ [1 rows x 3 columns]
K = 10
users = gl.SArray(sf['user_id'].unique().head(100))
recs = item_sim_model.recommend(users=users, k=K)
recs.head()
user_id | music_id | score | rank |
---|---|---|---|
279292bb36dbfc7f505e36ebf 038c81eb1d1d63e ... |
SOXUQNR12AF72A69D6 | 3.022422651449839 | 1 |
279292bb36dbfc7f505e36ebf 038c81eb1d1d63e ... |
SOUFAZA12AC3DFAB20 | 1.3368427753448486 | 2 |
279292bb36dbfc7f505e36ebf 038c81eb1d1d63e ... |
SOSFSTC12A8C141219 | 1.091982126235962 | 3 |
279292bb36dbfc7f505e36ebf 038c81eb1d1d63e ... |
SOVIWFP12A58A7D1BD | 1.045163869857788 | 4 |
279292bb36dbfc7f505e36ebf 038c81eb1d1d63e ... |
SOBMTQD12AB01833D0 | 1.0294516881306965 | 5 |
279292bb36dbfc7f505e36ebf 038c81eb1d1d63e ... |
SOCMNRG12AB0189D3F | 0.9756437937418619 | 6 |
279292bb36dbfc7f505e36ebf 038c81eb1d1d63e ... |
SOXOHUM12A67ADC826 | 0.9506873289744059 | 7 |
279292bb36dbfc7f505e36ebf 038c81eb1d1d63e ... |
SOWBFVW12A6D4F612B | 0.9092370669047037 | 8 |
279292bb36dbfc7f505e36ebf 038c81eb1d1d63e ... |
SOXFYTY127E9433E7D | 0.8977278073628744 | 9 |
279292bb36dbfc7f505e36ebf 038c81eb1d1d63e ... |
SOYBLYP12A58A79D32 | 0.8970928192138672 | 10 |