NVIDIA/torch-cunn

Name: torch-cunn

Owner: NVIDIA Corporation

Description: null

Forked from: torch/cunn

Created: 2016-08-15 23:07:12.0

Updated: 2016-08-15 23:07:27.0

Pushed: 2017-01-19 01:02:20.0

Homepage: null

Size: 1273

Language: Cuda

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

CUDA backend for the Neural Network Package

This package provides a CUDA implementation for many of the modules in the base nn package: nn

To use

Simply convert your network model to CUDA by calling :cuda():

l model = nn.Sequential()
l:add(nn.Linear(2,2))
l:add(nn.LogSoftMax())

l:cuda()  -- convert model to CUDA

… and similarly for your tensors:

l input = torch.Tensor(32,2):uniform()
t = input:cuda()
l output = model:forward(input)

… or create them directly as CudaTensors:

l input = torch.CudaTensor(32,2):uniform()
l output = model:forward(input)
To run unit-tests
it -l cunn -e 'cunn.test()'
GPU Training Concepts

Performance

local a = torch.CudaTensor(1000):uniform() for it=1,1000 do local b = torch.add(a, 1) end

this will allocate one thousand new `CudaTensor`s, one for each call to `torch.add(a, 1)`.

instead this form:

require 'cutorch'

local a = torch.CudaTensor(1000):uniform() local b = torch.CudaTensor(1000):uniform() for it=1,1000 do b:add(a, 1) end

his form, `b` is allocated only once, before the loop.  Then the `b:add(a,1)` operation will perform
add inside the GPU kernel, and store the result into the original `b` `CudaTensor`.  This
 run noticeably faster, in general.  It's also a lot less likely to eat up arbitrary amounts of memory,
less likely to need frequent calls to `collectgarbage(); collectgarbage()`.

nchmarking__

U operations will typically continue after an instruction has been issued
, if you do:

require 'cutorch' local a = torch.CudaTensor(1000,1000):uniform() a:add(1)

the GPU kernel to add 1 will only be scheduled for launch by `a:add(1)`.  It might not have completed yet, or
 have reached the GPU, at the time that the `a:add(1)` instructions has completed
erefore for running wall-clock timings, you should call `cutorch.synchronize()` before each timecheck
t:

require 'cutorch' require 'sys'

local a = torch.CudaTensor(1000,1000):uniform() cutorch.synchronize() start = sys.tic() a:add(1) cutorch.synchronize() print(sys.toc())


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.