LLNL/PnMPI

Name: PnMPI

Owner: Lawrence Livermore National Laboratory

Description: Virtualization Layer for the MPI Profiling Interface

Created: 2010-10-17 04:54:06.0

Updated: 2018-02-19 18:41:01.0

Pushed: 2018-02-08 23:18:17.0

Homepage: https://computing.llnl.gov/code/pnmpi/

Size: 1365

Language: C

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

PnMPI Tool Infrastructure

Travis Codecov

by Martin Schulz, schulzm@llnl.gov, LLNL-CODE-402774

PnMPI is a dynamic MPI tool infrastructure that builds on top of the standardized PMPI interface. It allows the user to

The package contains two main components:

So far, this software has mainly been tested on Linux clusters with RHEL-based OS distributions as well as IBM's BG/P systems. Continuous integration tests run at Ubuntu 12.04 and 14.04, OSX El Capitan and macOS Sierra. Some preliminary experiments have also included SGI Altix systems. Ports to other platforms should be straightforward, but this is not extensively tested. Please file an issue if you run into problems porting PnMPI or if you successfully deployed PnMPI on a new platform.

Many thanks to our contributors.

A) Building PnMPI

PnMPI uses CMake for its build system.

A1) Dependencies

In addition, PnMPI uses git submodules for several CMake modules, wrap and adept-utils. While the deploy source tarball includes all required submodules, git users need to checkout them with the following command in the root of the cloned repository:

git submodule update --init --recursive
A2) Configure the project

In the simplest case, you can run this in the top-level directory of the PnMPI tree:

cmake -DCMAKE_INSTALL_PREFIX=/path/to/install/destination
make
make install

This will configure, build, and install PnMPI to the destination specified. PnMPI supports parallel make with the -j parameter. E.g., for using eight build tasks, use:

cmake -DCMAKE_INSTALL_PREFIX=/path/to/install/destination
make -j8
make install

On more complex machines, such as those with filesystems shared among multiple platforms, you will want to separate out your build directories for each platform. CMake makes this easy.

Create a new build directory named according to the platform you are using, cd into it, an run cmake there. For example:

cd <pnmpi>
mkdir x86_64
cd x86_64
cmake -DCMAKE_INSTALL_PREFIX=/path/to/install/destination ..

Here, <pnmpi> is the top-level directory in the PnMPI tree. Note that when you run CMake this way, you need to supply the path to the PnMPI source directory as the last parameter. Here, that's just .. as we are building in a subdirectory of the source directory. Once you run CMake, simply run make and make install as before:

make -j8
make install

The PnMPI build should auto-detect your MPI installation and determine library and header locations. If you want to build with a particular MPI that is NOT the one auto-detected by the build, you can supply your particular MPI compiler as a parameter:

cmake \
  -DCMAKE_INSTALL_PREFIX=/path/to/install/destination \
  -DMPI_C_COMPILER=/path/to/my/mpicc \
  ..

See the documentation in FindMPI.cmake for more details on MPI build configuration options.

If you have problems, you may want to build PnMPI in debug mode. You can do this by supplying an additional parameter to cmake, e.g.:

cmake \
  -DCMAKE_INSTALL_PREFIX=/path/to/install/destination \
  -DCMAKE_BUILD_TYPE=Debug \
  ..

The extra/build directory contains a few sample invocations of CMake that have been successfully used on LLNL systems.

A3) Configuring with/without Fortran

By default PnMPI is configured to work with C/C++ and Fortran codes. However, on systems where Fortran is not available, the system should auto-detect this and not build the Fortran libraries and test cases. It can also be manually turned off by adding

-DENABLE_FORTRAN=OFF

to the cmake configuration command.

The PnMPI distribution contains test cases for C and Fortran that allow you to test the correct linkage.

A3a) Optional configuration options

If you want to change the default build configuration, you can enable / disable features by adding the following flags to the cmake configuration command.

A4) Configuring for cross-compiled environments

When configuring PnMPI in cross-compiled environments (such as Blue Gene/Q systems), it is necessary to provide a matching tool chain file. Many toolchain files are included in CMake, additional example files that allow the compilation on certain LC machines can be found in cmakemodules/Platform and cmakemodules/Toolchain.

For example, to configure PnMPI for a BG/Q machine using the GNU compliler suite, add the following to the cmake configuration command:

-DCMAKE_TOOLCHAIN_FILE=../cmakemodules/Toolchain/BlueGeneQ-gnu.cmake

You may need to modify the toolchain file for your system.

A5) Installed structure

Once you've installed, all your PnMPI files and executables should be in <CMAKE_INSTALL_PREFIX>, the path specified during configuration. Roughly, the install tree looks like this:

bin/
  pnmpi                PnMPI invocation tool
  pnmpi-patch          Library patching utility
lib/
  libpnmpi[f].[so,a]   PnMPI runtime libraries
  pnmpi-modules/       System-installed tool modules
  cmake/               Build files for external modules
include/
  pnmpi/               PnMPI header directory
    debug_io.h         PnMPI module debug print functions.
    hooks.h            PnMPI module hook definitions.
    service.h          PnMPI module service functions.
    ...
  pnmpi.h              PnMPI main header
  pnmpimod.h           PnMPI module support (legacy)
  pnmpi-config.h       CMake generated configuration file
share/
  cmake/               CMake files to support tool module builds

Test programs are not installed, but in the tests/src folder of the build directory, there should also be test programs built with PnMPI. See below for details on running these to test your PnMPI installation.

A6) Environment setup

You will need to set one environment variable to run PnMPI:

A6a) Using the PnMPI invocation tool

To run PnMPI in front of any application (that is dynamically linked to MPI), you may use the bin/pnmpi tool. It will setup the environment and preloads PnMPI for you. For a list of supported arguments, invoke it with the --help flag.

:~$ pnmpi --help
Usage: pnmpi [OPTION...] utility [utility options]
P^nMPI -- Virtualization Layer for the MPI Profiling Interface

  -c, --config=FILE          Configuration file
  -q, -s, --quiet, --silent  Don't produce any output
  -?, --help                 Give this help list
      --usage                Give a short usage message
  -V, --version              Print program version
...


:~$ mpiexec -np 2 pnmpi -c pnmpi.conf a.out
  _____   _ __   __  __  _____   _____
 |  __ \ | '_ \ |  \/  ||  __ \ |_   _|
 | |__) || | | || \  / || |__) |  | |
 |  ___/ |_| |_|| |\/| ||  ___/   | |
 | |            | |  | || |      _| |_
 |_|            |_|  |_||_|     |_____|


 Application:
  MPI interface: C

 Global settings:
  Pcontrol: 5

 Loaded modules:
  Stack default:
    sample1 (Pcontrol: 1)
  Stack foo:
...

Note: The PnMPI invocation tool is not compatible with all platforms (e.g. BlueGene/Q), as it requires the execvp() function, which might not be supported.

A7) RPATH settings

By default, the build adds the paths of all dependency libraries to the rpath of the installed PnMPI library. This is the preferred behavior on LLNL systems, where many packages are installed and LD_LIBRARY_PATH usage can become confusing.

If you are installing on a system where you do NOT want dependent libraries added to your RPATH, e.g. if you expect all of PnMPI's dependencies to be found in system paths, then you can build without rpath additions using this option to cmake:

-DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE

This will add only the PnMPI installation lib directory to the rpath of the PnMPI library.

B) Usage

PnMPI supports multiple ways to use it with an MPI application:

B1) Dynamic linking

If your application is dynamically linked against the MPI library, you may use PnMPI by simply preloading it via LD_PRELOAD (or DYLD_INSERT_LIBRARIES at macOS).

Instead of manually preloading, you may use the PnMPI invocation tool in bin/pnmpi.

B2) Static linking

Instead of linking your application against the MPI library, you may link against the static MPI library.

Note: By default the linker will only link functions used by your code, so most of the API functions would not get linked into your binary. PnMPI implements a helper function to force the linker to link all required functions into the binary. However there might be complications, if not all functions wrapped by the modules are used by the application. You should tell the linker to link the whole PnMPI archive explicitly:

mpicc main.o -Wl,--whole-archive pnmpi.a -Wl,--no-whole-archive -o test

Note: The linker option --whole-archive is not available at mac OS.

C) Modules

PnMPI supports two different kind of tool modules:

Among the former are modules that have been created independently of PnMPI and are just based on the PMPI interface. To use a transparent module in PnMPI the user has to perform two steps:

  1. Build the tool as a shared module (a dlopen-able shared library).
  2. Patch the tool using the pnmpi-patch utility, which is included with the PnMPI distribution.

Usage:

pnmpi-patch <original tool (in)> <patched tool (out)>

e.g.:

pnmpi-patch my-module.so my-pnmpi-module.so

After that, copy the tool in one of the directories listed in $PNMPI_LIB_PATH so that PnMPI can pick it up. Note that all of this is handled automatically by the CMake build files included with PnMPI (see below for more information).

The second option is the use of PnMPI specific modules: these modules also rely on the PMPI interface, but explicitly use some of the PnMPI features (i.e., they won't run outside of PnMPI). These modules include the PnMPI header files, which describe the interface that modules can use. In short, the interface offers an ability to register a module and after that use a publish/subscribe interface to offer/use services in other modules.

Note: also PnMPI specific modules have to be patched using the utility described above, if they don't use the XMPI interface instead of the PMPI.

C1) Built-in modules

This package includes a set of modules that can be used both to create other tools using their services and as templates for new modules.

The source for all modules is stored in separate directories inside the src/modules/ directory. There are:

Note: All modules must be compiled with the same MPI as PnMPI is build with. Modules should only be linked to their required libraries, except PnMPI and MPI, as these routines will be provided by PnMPI when the module will be loaded.

D) Building your own modules with CMake

PnMPI installs CMake build files with its distribution to allow external projects to quickly build MPI tool modules. The build files allow external tools to use PnMPI, the pnmpi-patch utility, PnMPI's wrapper generator, and PnMPI's dependency libraries.

To create a new PnMPI module, simply create a new project that looks something like this:

my-project/
  CMakeLists.txt
  foo.c
  wrapper.w

Assume that wrapper.w is a wrapper generator input file that will generate another file called wrapper.c, which contains MPI interceptor functions for the tool library. foo.c is additional code needed for the tool library. CMakeLists.txt is the CMake build file.

Your CMakeLists.txt file should start with something like this:

ect(my-module C)
e_minimum_required(VERSION 2.8.11.2)

_package(PnMPI REQUIRED)
_package(MPI REQUIRED)

wrapped_file(wrapper.c wrapper.w)
i_add_pmpi_module(foo foo.c wrapper.c)

all(TARGETS foo DESTINATION ${PnMPI_MODULES_DIR})

ude_directories(
MPI_INCLUDE_PATH}
PnMPI_INCLUDE_DIR}
CMAKE_CURRENT_SOURCE_DIR})

Once you've make your CMakeLists.txt file like this, you can build your PnMPI module like so:

cd my-module
mkdir $SYS_TYPE && cd $SYS_TYPE
cmake ..
make -j8
make -j8 install

This should find PnMPI on your system and build your module, assuming that you have your environment set up correctly.

D2) Limiting the threading level

If your module is not thread safe or is only able to process a limited amount of threading, it should limit the required threading level in the MPI_Init_thread wrapper:

MPI_Init_thread(int *argc, char ***argv, int required, int *provided)

 (required > MPI_THREAD_SINGLE)
required = MPI_THREAD_SINGLE;

turn XMPI_Init_thread(argc, argv, required, provided);

D3) Module hooks

At different points hooks will be called in all loaded modules. These can be used to trigger some functionality at a given time. All hooks have the return type void and are defined in pnmpi/hooks.h, which should be included for type safety. These following hooks will be called in all modules:

Note: You can use PNMPI_Service_CallHook() to call custom hooks in your modules. Just pass a custom hook name as first parameter.

For a detailed description, see the Doxygen docs or man-pages for these functions.

D4) Module service functions

Modules may interact with the PnMPI core and other modules with the module service functions defined in pnmpi/service.h. For a detailed description about these functions, see the Doxygen docs for the service header or the man-pages.

D5) Debug message functions

Modules may print debug messages, warnings and errors with the PnMPI API functions PNMPI_Debug, PNMPI_Warning and PNMPI_Warning defined in pnmpi/debug_io.h. PnMPI will add additional informations like rank or line numbers to the printed messages.

For a detailed description, see the Doxygen docs or man-pages for these functions.

E) Debug Options

If PnMPI is build with ENABLE_DEBUG, PnMPI includes debug print functions, that can be dynamically enabled. To control it, the environment variable PNMPI_DBGLEVEL can be set to any combination of the following debug levels:

NOTE: The first two levels should be enabled for single-rank executions only, as their output can't be limited to a single rank and thus will be printed on all ranks.

Additionally, the printouts can be restricted to a single node by setting the variable PNMPI_DBGNODE to an MPI rank.

E1) Using the PnMPI invocation tool

You may set the above options in the PnMPI invocation tool. Use the --debug option to enable a specific debug level and --debug-node to limit the debug output to a single rank.

F) Configuration and Demo codes

The PnMPI distribution includes test cases (in C and Fortran). They can be used to experiment with the basic PnMPI functionalities and to test the system setup. The following describes the C version (the F77 version works similarly):

  1. Change into the tests/src directory.

  2. The program test-mpi.c, which only initializes and then finalizes MPI, was compiled into three binaries:

    • testbin-binary_mpi_c-preload (plain MPI code)
    • testbin-binary_mpi_c-dynamic (linked dynamically against PnMPI)
    • testbin-binary_mpi_c-static (linked statically against PnMPI)
  3. Executing the *-preload binary will not print any output, but the binaries linked against PnMPI will print the PnMPI header, indicating PnMPI is loaded before MPI.

  4. PnMPI is configured through a configuration file that lists all modules to be load by PnMPI as well as optional arguments. The name for this file can be specified by the environment variable PNMPI_CONF. If this variable is not set or the file specified can not be found, PnMPI looks for a file called .pnmpi_conf in the current working directory, and if not found, in the user's home directory.

    A simple configuration file may look as follows:

    module sample1 module sample2 module sample3 module sample4

    (plus some additional lines starting with #, which indicates comments)

    This configuration causes these four modules to be loaded in the specified order. PnMPI will look for the corresponding modules (.so shared library files) in PNMPI_LIB_PATH.

  5. Running the testbin-binary_mpi_sendrecv (a simple test sending messages between the ranks) will load all four modules in the specified order and intercept all MPI calls included in these modules:

    • sample1: send and receive
    • sample2: send
    • sample3: receive
    • sample4: send and receive

    The program output (for 2 nodes) will be:

    _   _ __   __  __  _____   _____
    _ \ | '_ \ |  \/  ||  __ \ |_   _|
    _) || | | || \  / || |__) |  | |
    __/ |_| |_|| |\/| ||  ___/   | |
               | |  | || |      _| |_
               |_|  |_||_|     |_____|
    
    
    ication:
    interface: C
    
    al settings:
    trol: 5
    
    ed modules:
    k default:
    mple1 (Pcontrol: 1)
    mple2
    mple3
    mple4
    
    PER 1: Before recv
    PER 1: Before send
    PER 2: Before send
    PER 4: Before send
    PER 4: After send
    PER 2: After send
    PER 1: After send
    PER 1: Before recv
    PER 3: Before recv
    PER 4: Before recv
    PER 3: Before recv
    PER 4: Before recv
    PER 4: After recv
    PER 3: After recv
    PER 1: After recv
    1 from rank 1.
    PER 1: Before send
    PER 2: Before send
    PER 4: Before send
    PER 4: After send
    PER 4: After recv
    PER 3: After recv
    PER 1: After recv
    1 from rank 0.
    PER 2: After send
    PER 1: After send
    

    When running on a BG/P systems, it is necessary to explicitly export some environment variables. Here is an example:

    mpirun -np 4 -exp_env LD_LIBRARY_PATH -exp_env PNMPI_LIB_PATH \
      -cwd $PWD testbin-binary_mpi_sendrecv
    

G) Using MPI_Pcontrol

The MPI standard defines the MPI_Pcontrol, which does not have any direct effect (it is implemented as a dummy call inside of MPI), but that can be replaced by PMPI to accepts additional information from MPI applications (e.g., turn on/off data collection or markers for main iterations). The information is used by a PMPI tool linked to the application. When it is used with PnMPI the user must therefore decide which tool an MPI_Pcontrol call is directed to.

By default PnMPI will direct MPI_Pcontrol calls to first module in the tool stack only. If this is not the desired effect, users can turn on and off which module Pcontrols reach by adding pcontrol on and pcontrol off into the configuration file in a separate line following the corresponding module specification. Note that PnMPI allows that MPI_Pcontrol calls are sent to multiple modules.

In addition, the general behavior of Pcontrols can be specified with a global option at the beginning of the configuration file. This option is called globalpcontrol and can take one of the following arguments:

The PnMPI internal format for Pcontrol arguments is as follows:

level (same semantics as for MPI_Pcontrol itself)
type = PNMPI_PCONTROL_SINGLE or PNMPI_PCONTROL_MULTIPLE
           (target one or more modules) |
           PNMPI_PCONTROL_VARG or PNMPI_PCONTROL_PTR
           (arguments as vargs or one pointer)
mod = target module (if SINGLE)
modnum = number of modules (if MULTIPLE)
*mods = pointer to array of modules
size = length of all variable arguments (if VARG)
 *buf = pointer to argument block (if PTR)
Known issues:

Forwarding the variable argument list as done in pmpi and mixed is only implemented in a highly experimental version and disabled by default. To enable, compile PnMPI with the flag EXPERIMENTAL_UNWIND and link PnMPI with the libunwind library. Note that this is not extensively tested and not portable across platforms.

H) References

More documentation on PnMPI can be found in the following two published articles:

I) Contact

For more information or in case of questions, please contact Martin Schulz or file an issue.

Copyright

Copyright © 2008-2018 Lawrence Livermore National Security, LLC.
Copyright © 2011-2016 ZIH, Technische Universitaet Dresden, Federal Republic of Germany
Copyright © 2013-2018 RWTH Aachen University, Federal Republic of Germany

All rights reserved - please read the information in the LICENSE file.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.