NVIDIA/DeepRecommender

Name: DeepRecommender

Owner: NVIDIA Corporation

Description: Deep learning for recommender systems

Created: 2017-09-08 20:48:44.0

Updated: 2018-04-03 00:41:33.0

Pushed: 2018-03-27 16:41:51.0

Homepage: null

Size: 1382

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Deep AutoEncoders for Collaborative Filtering

This is not an official NVIDIA product. It is a research project described in: “Training Deep AutoEncoders for Collaborative Filtering”(https://arxiv.org/abs/1708.01715)

The model

The model is based on deep AutoEncoders.

AutEncoderPic

Requirements
Training using mixed precision with Tensor Cores
Getting Started
Run unittests first

The code is intended to run on GPU. Last test can take a minute or two.

thon -m unittest test/data_layer_tests.py
thon -m unittest test/test_model.py
Tutorial

Checkout this tutorial by miguelgfierro.

Get the data

Note: Run all these commands within your DeepRecommender folder

Netflix prize

Data stats

| Dataset | Netflix 3 months | Netflix 6 months | Netflix 1 year | Netflix full | | ——– | —————- | —————- | ———– | ———— | | Ratings train | 13,675,402 | 29,179,009 | 41,451,832 | 98,074,901 | | Users train | 311,315 |390,795 | 345,855 | 477,412 | | Items train | 17,736 |17,757 | 16,907 | 17,768 | | Time range train | 2005-09-01 to 2005-11-31 | 2005-06-01 to 2005-11-31 | 2004-06-01 to 2005-05-31 | 1999-12-01 to 2005-11-31 | ——– | —————- | ———– | ———— | | Ratings test | 2,082,559 | 2,175,535 | 3,888,684| 2,250,481 | | Users test | 160,906 | 169,541 | 197,951| 173,482 | | Items test | 17,261 | 17,290 | 16,506| 17,305 | | Time range test | 2005-12-01 to 2005-12-31 | 2005-12-01 to 2005-12-31 | 2005-06-01 to 2005-06-31 | 2005-12-01 to 2005-12-31

Train the model

In this example, the model will be trained for 12 epochs. In paper we train for 102.

on run.py --gpu_ids 0 \
th_to_train_data Netflix/NF_TRAIN \
th_to_eval_data Netflix/NF_VALID \
dden_layers 512,512,1024 \
n_linearity_type selu \
tch_size 128 \
gdir model_save \
op_prob 0.8 \
timizer momentum \
 0.005 \
ight_decay 0 \
g_step 1 \
ise_prob 0 \
m_epochs 12 \
mmary_frequency 1000

Note that you can run Tensorboard in parallel

nsorboard --logdir=model_save
Run inference on the Test set
on infer.py \
th_to_train_data Netflix/NF_TRAIN \
th_to_eval_data Netflix/NF_TEST \
dden_layers 512,512,1024 \
n_linearity_type selu \
ve_path model_save/model.epoch_11 \
op_prob 0.8 \
edictions_path preds.txt
Compute Test RMSE
on compute_RMSE.py --path_to_predictions=preds.txt

After 12 epochs you should get RMSE around 0.927. Train longer to get below 0.92

Results

It should be possible to achieve the following results. Iterative output re-feeding should be applied once during each iteration.

(exact numbers will vary due to randomization)

| DataSet | RMSE | Model Architecture | | ——– | —————- | —————- | | Netflix 3 months | 0.9373 | n,128,256,256,dp(0.65),256,128,n | | Netflix 6 months | 0.9207 | n,256,256,512,dp(0.8),256,256,n | | Netflix 1 year | 0.9225 | n,256,256,512,dp(0.8),256,256,n | | Netflix full | 0.9099 | n,512,512,1024,dp(0.8),512,512,n |


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.