LLNL/TraceR

Name: TraceR

Owner: Lawrence Livermore National Laboratory

Description: Trace Replay and Network Simulation Framework

Created: 2016-02-24 21:36:20.0

Updated: 2017-11-20 17:22:12.0

Pushed: 2017-12-12 21:18:17.0

Homepage: null

Size: 3195

Language: C

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

TraceR v2.1

TraceR is a trace replay tool built upon the ROSS-based CODES simulation framework. TraceR can be used for predicting network performance and understanding network behavior by simulating messaging in High Performance Computing applications on interconnection networks.

Build

Pending pull request (with new features): https://xgitlab.cels.anl.gov/codes/codes/merge_requests/21

1) AMPI-based BigSim format: download and build Charm++.

clone http://charm.cs.uiuc.edu/gerrit/charm

Follow instructions in the Charm++ manual.

Use “charm++” as target for compiling TraceR. Use “bgampi” as target for buidling AMPI used for collecting traces. In both of cases above, pass “bigemulator” as a build option.

2) OTF2: download and build scoreP for OTF2 support.

Refer to README.OTF2 file in this directory. Simulation of the following most commonly used collectives using algorithms used in MPICH is supported in this trace format: Barrier, Bcast, (All)Reduce, Alltoall(v), and Allgather. In contrast, for BigSim traces, the simulation depends on AMPI's implementation.

If using BigSim format, uncomment SELECT_TRACE = -DTRACER_BIGSIM_TRACES=1, otherwise SELECT_TRACE = -DTRACER_OTF_TRACES=1 should be left uncommented (one of two). Accordingly, either set CHARMPATH or ensure that otf2-config (which is inside the bin directory of scoreP install) is in your path. Then,

racer

Run
un -np <p> ../traceR --sync=3  -- ../conf/<choose here> <tracer_config>

Format of trace_config:

bal map file>
 jobs>
ce path for job0> <map file for job0> <number of ranks in job0> <iterations (use 1 if running in normal mode)>
ce path for job1> <map file for job1> <number of ranks in job1> <iterations (use 1 if running in normal mode)>

If “global map file” is not needed, use NA for it and “map file for job*“. For generating global and job map file, please refer to README inside utils for the format and sample map generation code.

More information on workflow of TraceR and network config files can be found at docs/UserWriteUp.txt and in CODES:codes/src/networks/model-net/doc

Example files for BigSim are in tracer/jacobi2d, while for OTF2 are in tracer/stencil4d-otf. Sample run command:

un -np 8 ../traceR --sync=3 --nkp=16 --extramem=100000 --max-opt-lookahead=1000000 --timer-frequency=1000 -- ../conf/tracer-torus.conf tracer_config

Parameters:
–sync: ROSS's PDES type. 1 - sequential, 2 - conservation, 3 - optimistic
–extramem: number of messages in ROSS's extra message buffer - each message is ~500 bytes - 100K should work for most cases
–max-opt-lookahead: leash on optimisitc execution in nanoseconds (1 micro second is a good value)
–timer-frequency: frequency with which PE0 should print current virtual time
–nkp : number of groups used for clustering LPs; recommended value for lower rollbacks: (total LPs)/(#MPI ranks)

Please refer to README.OTF for instructions on generating OTF2-MPI trace files. BigSim-AMPI trace file generation instructions are available at http://charm.cs.illinois.edu/manuals/html/bigsim/manual-1p.html.

Reference

Any published work that utilizes this software should include the following reference:

il Jain, Abhinav Bhatele, Samuel T. White, Todd Gamblin, and Laxmikant
ale. Evaluating HPC networks via simulation of parallel workloads. In
eedings of the ACM/IEEE International Conference for High Performance
uting, Networking, Storage and Analysis, SC '16. IEEE Computer Society,
mber 2016. LLNL-CONF-690662.
Release

Copyright (c) 2015, Lawrence Livermore National Security, LLC. Produced at the Lawrence Livermore National Laboratory.

Written by:

Nikhil Jain <nikhil.jain@acm.org>
Bilge Acun <acun2@illinois.edu>
Abhinav Bhatele <bhatele@llnl.gov>

LLNL-CODE-740483. All rights reserved.

This file is part of TraceR. For details, see: https://github.com/LLNL/TraceR. Please also read the LICENSE file for the MIT License notice.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.