LLNL/Kripke

Name: Kripke

Owner: Lawrence Livermore National Laboratory

Description: Kripke is a simple, scalable, 3D Sn deterministic particle transport code

Created: 2017-11-02 22:07:20.0

Updated: 2018-03-26 17:16:56.0

Pushed: 2018-03-26 17:24:27.0

Homepage: null

Size: 1213

Language: C++

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

KRIPKE

Version 1.2.0

Release Date 11/2/2017

Authors

License

See included file NOTICE.md

Overview

Kripke is a simple, scalable, 3D Sn deterministic particle transport code. Its primary purpose is to research how data layout, programming paradigms and architectures effect the implementation and performance of Sn transport. A main goal of Kripke is investigating how different data-layouts affect instruction, thread and task level parallelism, and what the implications are on overall solver performance.

Kripkie supports storage of angular fluxes (Psi) using all six striding orders (or “nestings”) of Directions (D), Groups (G), and Zones (Z), and provides computational kernels specifically written for each of these nestings. Most Sn transport codes are designed around one of these nestings, which is an inflexibility that leads to software engineering compromises when porting to new architectures and programming paradigms.

Early research has found that the problem dimensions (zones, groups, directions, scattering order) and the scaling (number of threads and MPI tasks), can make a profound difference in the performance of each of these nestings. To our knowledge this is a capability unique to Kripke, and should provide key insight into how data-layout effects Sn solver performance. An asynchronous MPI-based parallel sweep algorithm is provided, which employs the concepts of Group Sets (GS) Zone Sets (ZS), and Direction Sets (DS), borrowed from the Texas A&M code PDT.

As we explore new architectures and programming paradigms with Kripke, we will be able to incorporate these findings and ideas into our larger codes. The main advantages of using Kripke for this exploration is that it's light-weight (ie. easily refactored and modified), and it gets us closer to the real question we want answered: “What is the best way to layout and implement an Sn code on a given architecture+programming-model?” instead of the more commonly asked question “What is the best way to map my existing Sn code to a given architecture+programming-model?“.

Mini App or Proxy App?

Kripke is a Mini-App since it has a very small code base consisting of 3539 lines of C++ code (using cloc v1.67).

Kripke is also a Proxy-App since it is a proxy for the LLNL transport code ARDRA.

Analysis

A major challenge of achieving high-performance in an Sn transport (or any physics) code is choosing a data-layout and a parallel decomposition that lends itself to the targeted architecture. Often the data-layout determines the most efficient nesting of loops in computational kernels, which then determines how well your inner-most-loop SIMDizes, how you add threading (pthreads, OpenMP, etc.), and the efficiency and design of your parallel algorithms. Therefore, each nesting produces different loop nesting orders, which provides substantially different performance characteristics. We want to explore how easily and efficiently these different nestings map to different architectures. In particular, we are interested in how we can achieve good parallel efficiency while also achieving efficient use of node resources (such as SIMD units, memory systems, and accelerators).

Parallel sweep algorithms can be explored with Kripke in multiple ways. The core MPI algorithm could be modified or rewritten to explore other approaches, domain overloading, or alternate programming models (such as Charm++). The effect of load-imbalance is an understudied aspect of Sn transport sweeps, and could easily be studied with Kripke by artificially adding more work (ie unknowns) to a subset of MPI tasks. Block-AMR could be added to Kripke, which would be a useful way to explore the cost-benefit analysis of adding AMR to an Sn code, and would be a way to further study load imbalances and AMR effects on sweeps.

The coupling of on-node sweep kernel, the parallel sweep algorithm, and the choices of decomposing the problem phase space into GS's, ZS's and DS's impact the performance of the overall sweep. The tradeoff between large and small “units of work” can be studied. Larger “units of work” provide more opportunity for on-node parallelism, while creating larger messages, less “sends”, and less efficient parallel sweeps. Smaller “units of work” make for less efficient on-node kernels, but more efficient parallel sweeps.

We can also study trading MPI tasks for threads, and the effects this has on our programming models and cache efficiency.

A simple timer infrastructure is provided that measure each compute kernels total time.

Physical Models

Kripke solves the Discrete Ordinance and Diamond Difference discretized steady-state linear Boltzmann equation.

    H * Psi = (LPlus * S * L) * Psi + Q

Where:

Kripke is hard-coded to setup and solve the 3D Kobayashi radiation benchmark, problem 3i. Since Kripke does not have reflecting boundary conditions, the full-space model is solved. Command line arguments allow the user to modify the total and scattering cross-sections. Since Kripke is a multi-group transport code and the Kobayashi problem is single-group, each energy group is setup to solve the same problem with no group-to-group coupling in the data.

The steady-state solution method uses the source-iteration technique, where each iteration is as follows:

  1. Phi = LTimes(Psi)
  2. PhiOut = Scattering(Phi)
  3. PhiOut = PhiOut + Source()
  4. Rhs = LPlusTimes(PhiOut)
  5. Psi = Sweep(Rhs, Psi) which is solving Psi=(Hinverse * Rhs) a.k.a “Inverting H”

Building and Running

Kripke comes with a simple CMake based build system.

Requirements
Quick Start

The easiest way to get Kripke running, is to directly invoke CMake and take whatever system defaults you have for compilers and let CMake find MPI for you.

Running Kripke

Environment Variabes

If Kripke is build with OpenMP support, then the environment variables OMP_NUM_THREADS is used to control the number of OpenMP threads. Kripke does not attempt to modify the OpenMP runtime in anyway, so other OMP_* environment variables should also work as well.

Command Line Options

Command line option help can also be viewed by running “./kripke –help”

Problem Size Options:
Physics Parameters:
On-Node Options:
Parallel Decomposition Options:
Solver Options:

Future Plans

Some ideas for future study:

Retirement

Retirement of this Mini-App should be considered when it is no longer a representative of state-of-the-art transport codes, or when it becomes too cumbersome to adapt to advanced architectures. Also, at the point of retirement it should be clear how to design its successor.

Links

Release

LLNL-CODE-658597


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.