hammerlab/spark-json-relay

Name: spark-json-relay

Owner: Hammer Lab

Description: SparkListener that converts SparkListenerEvents to JSON and forwards them to an external service via RPC.

Created: 2015-06-09 17:32:19.0

Updated: 2017-10-18 07:03:40.0

Pushed: 2016-10-28 09:03:58.0

Homepage: null

Size: 280

Language: Scala

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

JsonRelay

JsonRelay is a SparkListener that converts SparkListenerEvents to JSON and forwards them to an external service via RPC.

It is designed to be used with slim, which consumes JsonRelay's emitted events and writes useful statistics about them to Mongo, from whence Spree serves up live-updating web pages.

Usage

With Spark >= 1.5.0 you can simply pass the following flags to your spark-shell and spark-submit commands:

--packages org.hammerlab:spark-json-relay:2.0.1
--conf spark.extraListeners=org.apache.spark.JsonRelay

If using earlier versions of Spark, you'll need to first download the JAR:

et https://repo1.maven.org/maven2/org/hammerlab/spark-json-relay/2.0.1/spark-json-relay-2.0.1.jar

Then, pass these flags to your spark-submit or spark-shell commands:

--driver-class-path spark-json-relay-2.0.1.jar
--conf spark.extraListeners=org.apache.spark.JsonRelay

That's it!

Two additional flags, --conf spark.slim.{host,port}, specify the location JsonRelay will attempt to connect and send events to.

Implementation

JsonRelay just piggybacks on Spark's JsonProtocol for JSON serialization, with two differences:

  1. It adds an appId field to all events; this allows downstream consumers to process events from multiple Spark applications simultaneously / more easily over time.
  2. It rolls its own serialization of SparkListenerExecutorMetricsUpdate events, which is omitted from Spark's JsonProtocol in Spark prior to 1.5.0 (cf. SPARK-9036).
Questions?

Please file an issue if you have any questions about or problems using JsonRelay!


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.