twitter/torch-distlearn

Name: torch-distlearn

Owner: Twitter, Inc.

Description: A set of distributed learning algorithms for Torch

Created: 2016-01-13 23:09:03.0

Updated: 2017-12-16 13:00:17.0

Pushed: 2016-02-17 20:20:24.0

Homepage:

Size: 20

Language: Lua

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

DistLearn

Some common distributed learning algorithms built in Torch with the help of the the ipc library.

AllReduceSGD

Spreads the computation of gradients for mini-batch of items across N processes. Uses AllReduce to quickly sum the gradients and distribute the total back out to every process.

l allReduceSGD = require 'distlearn.AllReduceSGD'(tree)
ake sure all the nodes start with the same parameter values
educeSGD.synchronizeParameters(params)
_ = 1,epochs do
or _ = 1,steps
  -- Compute your gradients as normal
  local grads = computeYourGrads(...)
  -- Sum and normalize them
  allReduceSGD.sumAndNormalizeGradients(grads)
  -- Do your SGD as normal
  SGD(params, grads)
nd
- Before validating we should make sure all nodes have
- the exact same parameter values
llReduceSGD.synchronizeParameters(params)
- Validate...

When used in combination with Dataset you can quickly parallelize the processing of large datasets without a ton of effort. See the MNIST example for a complete working setup.

AllReduceEA

We also have a AllReduce based implementation of the Elastic Averaging algorithm as described in Deep learning with Elastic Averaging SGD. Its just as easy to add this to your training script, there are only two parameters required tau and alpha. Tau is how many steps to run before averaging the nodes and alpha is the weight used during the averaging step. You can read more about our implementation of AllReduceEA.

se a tau of 10 and an alpha of 0.2
l allReduceEA = require 'distlearn.AllReduceEA'(tree, 10, 0.2)
ake sure all the nodes start with the same parameter values
educeEA.synchronizeParameters(params)
_ = 1,epochs do
or _ = 1,steps
  -- Compute your gradients as normal
  local grads = computeYourGrads(...)
  -- Do your SGD as normal
  SGD(params, grads)
  -- Average the params
  allReduceEA.averageParameters(params)
nd
- Make sure the center's haven't drifted too far due to
- floating point precision error build up
llReduceEA.synchronizeCenter(params)
- Validate...

See a complete working example of EA and MNIST

License

Licensed under the Apache License, Version 2.0. See LICENSE file.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.