LLNL/lbann

Name: lbann

Owner: Lawrence Livermore National Laboratory

Description: Livermore Big Artificial Neural Network Toolkit

Created: 2016-05-11 20:04:20.0

Updated: 2018-04-02 22:41:45.0

Pushed: 2018-04-02 22:41:42.0

Homepage: http://software.llnl.gov/lbann/

Size: 3765659

Language: C++

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

LBANN: Livermore Big Artificial Neural Network Toolkit

Building LBANN
LC Systems
  1. Clone this repo using git clone https://github.com/LLNL/lbann.git
  2. From anywhere in the lbann directory run the LC build script located in
    <LBANN_dir>/lbann/scripts/build_lbann_lc.sh
  3. This will build LBANN in a newly created build directory. This build script uses the CMake superbuild. Information on the super build can be found in the superbuild directory.
  4. After the first use of this script, subsequent uses will recompile LBANN using the Makefile found in build/<compiler>.<cluster>.llnl.gov/lbann/build/.
  5. To reconfigure the build add the –reconfigure plan. For example, to change this build from a release build to a debug build, add –debug and –reconfigure.
  6. To completely rebuild LBANN and its dependencies add the –clean-build flag. Other useful configuration options can be viewed by running the script with the –help flag.
OS X
  1. Clone this repo using git clone https://github.com/LLNL/lbann.git
  2. From anywhere in the lbann directory run the LC build script located in
    <LBANN_dir>/lbann/scripts/build_lbann_osx.sh
  3. This will build LBANN in a newly created build directory.
Building LBANN with Spack [for Users]

spack install lbann

Building LBANN with Spack [for Developers]
Installing a compiler (if needed)

LBANN uses C++ features provided by newer compilers. If you do not have the necessary compiler, you can use spack to install one. For full details, see the spack documentation.

spack install gcc@7.1.0

The above command builds and installs a compiler. It prints the install path as the final line. If successful, then register this compiler with spack using the spack compiler find command, passing the install path as an argument.

spack compiler add /path/to/compiler/install
Using spack setup

Here is an example of setting up the local build environment on x86_64 HPC system

cd lbann
mkdir spack_builds; cd spack_builds
../scripts/spack_receipes/build_lbann.sh -c gcc@7.1.0 -b openblas -m mvapich2
cd gcc-7.1.0_x86_64_mvapich2_openblas_rel/build
make -j all

Spack Setup

The build_lbann.sh script roughly does the following steps for this example:

spack setup lbann@local build_type=Release dtype=4 %gcc@7.1.0 ^elemental@master blas=openblas ^mvapich2
spack setup lbann@local %intel@18.0.0 ^mvapich2
mkdir -p gcc-7.1.0_x86_64_mvapich2_openblas_rel/build
cd gcc-7.1.0_x86_64_mvapich2_openblas_rel/build
../spconfig.py ../../..

By default, MVAPICH2 builds for PSM. For an ibverbs build of MVAPICH2, use the following:

../scripts/spack_receipes/build_lbann.sh -c gcc@7.1.0 -b openblas -m 'mvapich2 fabrics=mrail'
LBANN Container Builds

We provide basic container defintion files, and instructions for their use, in the containers subdirectory. We currently support Docker and Singularity.

Cmake (Non LC or OSX Systems/Script alternative)
  1. Ensure the following dependencies are installed CMake MPI Elemental OpenCV CUDA (optional) cuDNN (optional) Protocol Buffers (optional) Doxygen (optional) Note: LBANN also requires a C++ compiler with OpenMP support. The GCC 5.0 and Intel 16.0 C++ compilers are recommended
    1. Clone this repo using git clone https://github.com/LLNL/lbann.git
    2. In the main LBANN directory create a build directory using mkdir build
    3. cd into this directory and run the following commands cmake ../.. make make install Note: It may be necessary to manually set CMake variables to control the build configuration
Verifying LBANN on LC
  1. Allocate compute resources using SLURM: salloc -N1 -t 60

  2. Run a test experiment for the MNIST data set; from the main lbann directory run the following command:

     -n12 build/gnu.catalyst.llnl.gov/install/bin/lbann \
    del=model_zoo/models/lenet_mnist/model_lenet_mnist.prototext \
    ader=model_zoo/data_readers/data_reader_mnist.prototext \
    timizer=model_zoo/optimizers/opt_adagrad.prototext \
    m_epochs=5
    

    Note: srun -n12 build/gnu.catalyst.llnl.gov/install/bin/lbann assumes you are running on the LLNL catalyst platform; if running on some other platform, and/or have installed lbann in a different directory, you will need to adjust this command.

    This should produce roughly the following final results on Catalyst:

    ----------------------------------------------------------------------------
    Epoch : stats formated [tr/v/te] iter/epoch = [844/94/157]
        global MB = [  64/  64/  64] global last MB = [  48  /  48  /  16  ]
         local MB = [  64/  64/  64]  local last MB = [  48+0/  48+0/  16+0]
    ----------------------------------------------------------------------------
    l 0 training epoch 4 objective function : 0.0471567
    l 0 training epoch 4 categorical accuracy : 99.6241%
    l 0 training epoch 4 run time : 7.64182s
    l 0 training epoch 4 mini-batch time statistics : 0.00901458s mean, 0.0212693s max, 0.0078979s min, 0.000458463s stdev
    l 0 validation objective function : 0.0670221
    l 0 validation categorical accuracy : 98.9%
    l 0 validation run time : 0.25341s
    l 0 validation mini-batch time statistics : 0.00269454s mean, 0.00285273s max, 0.0020936s min, 6.65695e-05s stdev
    l 0 test objective function : 0.0600125
    l 0 test categorical accuracy : 99.02%
    l 0 test run time : 0.421912s
    l 0 test mini-batch time statistics : 0.00268631s mean, 0.00278771s max, 0.00131827s min, 0.00011085s stdev
    

    Note: LBANN performance will vary on a machine to machine basis. Results will also vary, but should not do so significantly.

Running other models

There are various prototext models under the lbann/model_zoo/models/ directory: alexnet, autoencoder_mnist, lenet_mnist, etc. Each of these directories should have a script called runme.py. Run this script with no command line parameters for complete usage. Basically, these scripts generate command lines similar to the one above (in the Verifying LBANN on LC section). The scripts take two required arguments: –nodes=<int> and –tasks=<int>. The “tasks” option is used to specify the number of tasks per node, hence, the total number of tasks (cores) is: nodes*tasks. The generated command lines are designed to be executed using srun on LC systems, so you may need to modify, e.g, substitute mpirun, depending on your specific system.

Note: some directories contain multiple models, e.g, as of this writing, the autoencoder_cifar10 directory contains both model_autoencoder_cifar10.prototext and model_conv_autoencoder_cifar10.prototext. In these cases there may be multiple python scripts, e.g, runme_conv.py.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.