hammerlab/yarn-logs-helpers

Name: yarn-logs-helpers

Owner: Hammer Lab

Description: Scripts for parsing / making sense of yarn logs

Created: 2014-11-20 04:49:13.0

Updated: 2017-11-27 12:34:42.0

Pushed: 2016-08-22 17:29:55.0

Homepage: null

Size: 37

Language: Shell

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

yarn-logs-helpers

Scripts for parsing / making sense of yarn logs.

Contents

yarn-container-logs

The main script of note here is yarn-container-logs:

rn-container-logs 0018
Spark-specific parsing

A common use case is parsing logs from Spark apps running on YARN, for which yarn-container-logs has some specific functionality:

Stack Trace Parsing / Histogram

yarn-logs-stack-traces uses a stack-trace-parsing library on the output of yarn-logs. Example usage:

s 0018 -d  # -d means "show a histogram in descending order"
stacks in total

ccurrences:
apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 4
    at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$1.apply(MapOutputTracker.scala:386)
    at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$1.apply(MapOutputTracker.scala:383)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
    ...
    at java.lang.Thread.run(Thread.java:744)

ccurrences:
.io.IOException: Failed to connect to demeter-csmaz11-16.demeter.hpc.mssm.edu/172.29.46.86:33263
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:141)
    at org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:78)
    at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
    ...
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)


Other Miscellaneous Scripts

This repo contains several other scripts that basically wrap YARN commands in calls to yarn-appid, allowing last-4-lookup of application IDs:

Installing

Download this repository with:

    git clone --recursive https://github.com/hammerlab/yarn-logs-helpers.git

In your .bashrc (or equivalent), source .yarn-logs-helpers.sourceme:

    $ source /path/to/repo/.yarn-logs-helpers.sourceme

This will:

Env vars

Setting $YARN_LOGS_USER may allow yarn-container-logs to fetch logs from apps run by users other than you.

You can set it permanently in your .bashrc to a user that has permissions to read all YARN users' logs, or just on the cmdline for one call:

_LOGS_USER=someone yarn-logs 1234

You may also want to export YARN_HELPERS_DROP_HOST_SUFFIX_FROM (discussed above):

    # Pattern for abbreviating host names when creating per-host log directories.
    export YARN_HELPERS_DROP_HOST_SUFFIX_FROM=".rest.of.domain.name_"
stack-traces submodule

Finally, ryan-williams/stack-traces is included in this repository as a git submodule, and used by yarn-log-stack-traces.

You'll need to git clone --recursive when you check out the project, or run git submodule init && git submodule update from within the stack-traces subdirectory, for it to work. git-scm.com has a good intro to using git submodules if you are not familiar.

With those done you should be all set!


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.