jaegertracing/spark-dependencies

Name: spark-dependencies

Owner: Jaeger - Distributed Tracing System

Description: Spark job for dependency links

Created: 2017-09-11 08:25:15.0

Updated: 2018-05-22 10:50:42.0

Pushed: 2018-03-22 08:01:42.0

Homepage: http://jaegertracing.io/

Size: 118

Language: Java

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Build Status

Jaeger Spark dependencies

This is a Spark job that collects spans from storage, analyze links between services, and stores them for later presentation in the UI. Note that it is needed for the production deployment. all-in-one distribution does not need this job.

This job parses all traces on a given day, based on UTC. By default, it processes the current day, but other days can be explicitly specified.

This repository is based on zipkin-dependencies.

Quick-start

Spark job can be run as docker container and also as java executable:

Docker:

cker run --env STORAGE=cassandra --env CASSANDRA_CONTACT_POINTS=host1,host2 jaegertracing/spark-dependencies

Use --env JAVA_OPTS=-Djavax.net.ssl. to set trust store and other Java properties.

As jar file:

AGE=cassandra java -jar jaeager-spark-dependencies.jar
Usage

By default, this job parses all traces since midnight UTC. You can parse traces for a different day via an argument in YYYY-mm-dd format, like 2016-07-16 or specify the date via an env property.

 to run the job to process yesterday's traces on OS/X
ORAGE=cassandra java -jar jaeger-spark-dependencies.jar `date -uv-1d +%F`
 on Linux
ORAGE=cassandra java -jar jaeger-spark-dependencies.jar `date -u -d '1 day ago' +%F`
Configuration

jaeger-spark-dependencies applies configuration parameters through environment variables.

The following variables are common to all storage layers:

* `SPARK_MASTER`: Spark master to submit the job to; Defaults to `local[*]`
* `DATE`: Date in YYYY-mm-dd format. Denotes a day for which dependency links will be created.
Cassandra

Cassandra is used when STORAGE=cassandra.

* `CASSANDRA_KEYSPACE`: The keyspace to use. Defaults to "jaeger_v1_dc1".
* `CASSANDRA_CONTACT_POINTS`: Comma separated list of hosts / ip addresses part of Cassandra cluster. Defaults to localhost
* `CASSANDRA_LOCAL_DC`: The local DC to connect to (other nodes will be ignored)
* `CASSANDRA_USERNAME` and `CASSANDRA_PASSWORD`: Cassandra authentication. Will throw an exception on startup if authentication fails
* `CASSANDRA_USE_SSL`: Requires `javax.net.ssl.trustStore` and `javax.net.ssl.trustStorePassword`, defaults to false.

Example usage:

ORAGE=cassandra CASSANDRA_CONTACT_POINTS=localhost:9042 java -jar jaeger-spark-dependencies.jar
Elasticsearch

Elasticsearch is used when STORAGE=elasticsearch.

* `ES_NODES`: A comma separated list of elasticsearch hosts advertising http. Defaults to
              localhost. Add port section if not listening on port 9200. Only one of these hosts
              needs to be available to fetch the remaining nodes in the cluster. It is
              recommended to set this to all the master nodes of the cluster. Use url format for
              SSL. For example, "https://yourhost:8888"
* `ES_NODES_WAN_ONLY`: Set to true to only use the values set in ES_HOSTS, for example if your
                       elasticsearch cluster is in Docker. Defaults to false
* `ES_USERNAME` and `ES_PASSWORD`: Elasticsearch basic authentication. Use when X-Pack security
                                   (formerly Shield) is in place. By default no username or
                                   password is provided to elasticsearch.

Example usage:

ORAGE=elasticsearch ES_NODES=http://localhost:9200 java -jar jaeger-spark-dependencies.jar
Building locally

To build the job locally and run tests:

nw clean install # if failed add SPARK_LOCAL_IP=127.0.0.1
er build -t jaegertracing/spark-dependencies:latest .
License

Apache 2.0 License.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.