NVIDIA/torch-cutorch

Name: torch-cutorch

Owner: NVIDIA Corporation

Description: A CUDA backend for Torch7

Forked from: torch/cutorch

Created: 2016-08-15 23:06:33.0

Updated: 2016-12-04 17:32:07.0

Pushed: 2017-01-20 03:21:56.0

Homepage:

Size: 1912

Language: Cuda

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

cutorch

NOTE on API changes and versioning

Cutorch provides a CUDA backend for torch7.

Cutorch provides the following:

torch.CudaTensor

This new tensor type behaves exactly like a torch.FloatTensor, but has a couple of extra functions of note:

Other CUDA tensor types

Most other (besides float) CPU torch tensor types now have a cutorch equivalent, with similar names:

Note: these are currently limited to copying/conversion, and several indexing and shaping operations (e.g. narrow, select, unfold, transpose).

CUDA memory allocation

Set the environment variable THC_CACHING_ALLOCATOR=1 to enable the caching CUDA memory allocator.

By default, cutorch calls cudaMalloc and cudaFree when CUDA tensors are allocated and freed. This is expensive because cudaFree synchronizes the CPU with the GPU. Setting THC_CACHING_ALLOCATOR=1 will cause cutorch to cache and re-use CUDA allocations to avoid synchronizations.

With the caching memory allocator, allocations and frees should logically be considered “usages” of the memory segment associated with streams, just like kernel launches. The programmer must insert the proper synchronization if memory segments are used from multiple streams.

cutorch.* API
Low-level streams functions (dont use this as a user, easy to shoot yourself in the foot): Common Examples

Transfering a FloatTensor src to the GPU:

 = src:cuda() -- dest is on the current GPU

Allocating a tensor on a given GPU: Allocate src on GPU 3

rch.setDevice(3)
= torch.CudaTensor(100)

Copying a CUDA tensor from one GPU to another: Given a tensor called src on GPU 1, if you want to create it's clone on GPU 2, then:

rch.setDevice(2)
l dest = src:clone()

OR

l dest
rch.withDevice(2, function() dest = src:clone() end)
API changes and Versioning

Version 1.0 can be installed via: luarocks install cutorch 1.0-0 Compared to version 1.0, these are the following API changes:

| operators | 1.0 | master | |—|—|—| | lt, le, gt, ge, eq, ne return type | torch.CudaTensor | torch.CudaByteTensor | | min,max (2nd return value) | torch.CudaTensor | torch.CudaLongTensor | | maskedFill, maskedCopy (mask input) | torch.CudaTensor | torch.CudaByteTensor | | topk, sort (2nd return value) | torch.CudaTensor | torch.CudaLongTensor |

Inconsistencies with CPU API

| operators | CPU | CUDA | |—|—|—|


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.