jvm-profiling-tools/async-profiler

Name: async-profiler

Owner: jvm-profiling-tools

Description: Sampling CPU and HEAP profiler for Java featuring AsyncGetCallTrace + perf_events

Created: 2016-04-23 01:27:15.0

Updated: 2018-01-14 18:02:43.0

Pushed: 2017-12-15 15:03:13.0

Homepage:

Size: 232

Language: C++

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

async-profiler

This project is a low overhead sampling profiler for Java that does not suffer from Safepoint bias problem. It features HotSpot-specific APIs to collect stack traces and to track memory allocations. The profiler works with OpenJDK, Oracle JDK and other Java runtimes based on HotSpot JVM.

async-profiler can trace the following kinds of events:

CPU profiling

In this mode profiler collects stack trace samples that include Java methods, native calls, JVM code and kernel functions.

The general approach is receiving call stacks generated by perf_events and matching them up with call stacks generated by AsyncGetCallTrace, in order to produce an accurate profile of both Java and native code. Additionally, async-profiler provides a workaround to recover stack traces in some corner cases where AsyncGetCallTrace fails.

This approach has the following advantages compared to using perf_events directly with a Java agent that translates addresses to Java method names:

ALLOCATION profiling

Instead of detecting CPU-consuming code, the profiler can be configured to collect call sites where the largest amount of heap memory is allocated.

async-profiler does not use intrusive techniques like bytecode instrumentation or expensive DTrace probes which have significant performance impact. It also does not affect Escape Analysis or prevent from JIT optimizations like allocation elimination. Only actual heap allocations are measured.

The profiler features TLAB-driven sampling. It relies on HotSpot-specific callbacks to receive two kinds of notifications:

This means not each allocation is counted, but only allocations every N kB, where N is the average size of TLAB. This makes heap sampling very cheap and suitable for production. On the other hand, the collected data may be incomplete, though in practice it will often reflect the top allocation sources.

Unlike Java Mission Control which uses similar approach, async-profiler does not require Java Flight Recorder or any other JDK commercial feature. It is completely based on open source technologies and it works with OpenJDK.

The minimum supported JDK version is 7u40 where the TLAB callbacks appeared.

Heap profiler requires HotSpot debug symbols. Oracle JDK already has them embedded in libjvm.so, but in OpenJDK builds they are typically shipped in a separate package. For example, to install OpenJDK debug symbols on Debian / Ubuntu, run

t-get install openjdk-8-dbg
Supported platforms

Note: macOS profiling is limited only to Java code, since native stack walking relies on perf_events API which is available only on Linux platforms.

Building

Build status: Build Status

Make sure the JAVA_HOME environment variable points to your JDK installation, and then run make. GCC is required. After building, the profiler agent binary will be in the build subdirectory. Additionally, a small application jattach that can load the agent into the target process will also be compiled to the build subdirectory.

Basic Usage

As of Linux 4.6, capturing kernel call stacks using perf_events from a non- root process requires setting two runtime variables. You can set them using sysctl or as follows:

ho 1 > /proc/sys/kernel/perf_event_paranoid
ho 0 > /proc/sys/kernel/kptr_restrict

To run the agent and pass commands to it, the helper script profiler.sh is provided. A typical workflow would be to launch your Java application, attach the agent and start profiling, exercise your performance scenario, and then stop profiling. The agent's output, including the profiling results, will be displayed in the Java application's standard output.

Example:

s
 Jps
 Computey
profiler.sh start 8983
profiler.sh stop 8983

Alternatively, you may specify -d (duration) argument to profile the application for a fixed period of time with a single command.

profiler.sh -d 30 8983

By default, the profiling frequency is 1000Hz (every 1ms of CPU time). Here is a sample of the output printed to the Java application's terminal:

Execution profile ---
l:                   687
own (native):        1 (0.15%)

les: 679 (98.84%)
[ 0] Primes.isPrime
[ 1] Primes.primesThread
[ 2] Primes.access$000
[ 3] Primes$1.run
[ 4] java.lang.Thread.run

a lot of output omitted for brevity ...

     679 (98.84%) Primes.isPrime
       4 (0.58%)  __do_softirq

more output omitted ...

This indicates that the hottest method was Primes.isPrime, and the hottest call stack leading to it comes from Primes.primesThread.

Flame Graph visualization

async-profiler provides out-of-the-box Flame Graph support. Specify -o svg argument to dump profiling results as an interactive SVG immediately viewable in all mainstream browsers. Also, SVG output format will be chosen automatically if the target filename ends with .svg.

s
 Jps
 Computey
profiler.sh -d 30 -f /tmp/flamegraph.svg 8983

Example

Profiler Options

The following is a complete list of the command-line options accepted by profiler.sh script.

Restrictions/Limitations
Troubleshooting

Could not start attach mechanism: No such file or directory means that the profiler cannot establish communication with the target JVM through UNIX domain socket.

For the profiler to be able to access JVM, make sure

  1. You run profiler under exactly the same user as the owner of target JVM process.
  2. /tmp directory of Java process is physically the same directory as /tmp of your shell.
  3. JVM is not run with -XX:+DisableAttachMechanism option.

[frame_buffer_overflow] in the output means there was not enough space to store all call traces. Consider increasing frame buffer size with -b option.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.