spotify/semantic-metrics

Name: semantic-metrics

Owner: Spotify

Description: Capturing meaningful metrics in your Java application

Created: 2016-02-23 15:58:00.0

Updated: 2018-05-15 14:53:00.0

Pushed: 2018-05-15 14:52:58.0

Homepage:

Size: 189

Language: Java

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

semantic-metrics

Build Status

This project contains modifications to the dropwizard metrics project.

The primary additions includes a replacement for MetricRegistry allowing for metric names containing tags through MetricId.

Usage

The following are the interfaces and classes that has to be used from this package in order for MetricId to be used.

You will find these types in com.spotify.metrics.core.

Care must be taken not to use the upstream MetricRegistry because it does not support the use of MetricId. To ease this, all of the replacing classes follow the Semantic* naming convention.

As an effect of this, pre-existing plugins for codahale metrics will not work.

Installation

Add a dependency in maven.

endency>
roupId>com.spotify.metrics</groupId>
rtifactId>semantic-metrics-core</artifactId>
ersion>${semantic-metrics.version}</version>
pendency>

Provided Plugins

This project provide the following set of plugins;

See and run examples.

Considerations

`MetricId` is expensive to create and modify

If you find yourself in a situation where you create many instances of this class (i.e. when reporting metrics), make use of MetricIdCache

The following is an example integrating with Guava.

uavaCache.java

ic final class GuavaCache<T> implements MetricIdCache.Cache<T> {
final Cache<T, MetricId> cache = CacheBuilder.newBuilder().expireAfterAccess(6, TimeUnit.HOURS)
        .build();

private final MetricIdCache.Loader<T> loader;

public GuavaCache(Loader<T> loader) {
    this.loader = loader;
}

@Override
public MetricId get(final MetricId base, final T key) throws ExecutionException {
    return cache.get(key, new Callable<MetricId>() {
        @Override
        public MetricId call() throws Exception {
            return loader.load(base, key);
        }
    });
}

@Override
public void invalidate(T key) {
    cache.invalidate(key);
}

@Override
public void invalidateAll() {
    cache.invalidateAll();
}

public static MetricIdCache.Any setup() {
    return MetricIdCache.builder().cacheBuilder(new MetricIdCache.CacheBuilder() {
        @Override
        public <T> MetricIdCache.Cache<T> build(final Loader<T> loader) {
            return new GuavaCache<T>(loader);
        }
    });
}

ava
yApplicationStatistics.java

ic class MyApplicationStatistics() {
private final MetricIdCache.Typed<String> endpoint = GuavaCache.setup()
    .loader(new MetricIdCache.Loader<String>() {
        @Override
        public MetricId load(MetricId base, String endpoint) {
            return base.tagged("endpoint", endpoint);
        }
    });

private final MetricIdCache<String> requests = endpoint
    .metricId(MetricId.build().tagged("what", "endpoint-requests", "unit", "request"))
    .build();

private final MetricIdCache<String> errors = endpoint
    .metricId(MetricId.build().tagged("what", "endpoint-errors", "unit", "error"))
    .build();

private final SemanticMetricRegistry registry;

public MyApplicationStatistics(SemanticMetricRegistry registry) {
    this.registry = registry;
}

public void reportRequest(String endpoint) {
    registry.meter(requests.get(endpoint)).mark();
}

public void reportError(String endpoint) {
    registry.meter(errors.get(endpoint)).mark();
}

Don't assume that semantic-metrics will be around forever

Avoid performing deep integration of semantic-metrics into your library or application. This will prevent you, and third parties, from integrating your code with different metric collectors.

As an alternative you should build a tree of interfaces that your application uses to report metrics (e.g. my-service-statistics), and use these to build an implementation using semantic metrics (my-service-semantic-statistics).

This pattern greatly simplifies integrating your application with more than one metric collector, or ditching semantic-metrics when it becomes superseded by something better.

At configuration time your application can decide which implementation to use by simply providing an instance of the statistics API which suits their requirements.

Example

Build an interface describing all the things that your application reports.

ic interface MyApplicationStatistics {
/**
 * Report that a single request has been received by the application.
 */
void reportRequest();

Provide a semantic-metrics implementation.

ic class SemanticMyApplicationStatistics implements MyApplicationStatistics {
private final SemanticMetricRegistry registry;

private final Meter request;

public SemanticMyApplicationStatistics(SemanticMetricRegistry registry) {
    this.registry = registry;
    this.request = registry.meter(MetricId.build().tagged(
        "what", "requests", "unit", "request"));
}

@Override
public void reportRequest() {
    request.mark();
}

Now a user of your framework/application can do something like the following to bootstrap your application.

ic class Entry {
public static void main(String[] argv) {
    final SemanticMetricRegistry registry = new SemanticMetricRegistry();
    final MyApplicationStatistics statistics = new SemanticMyApplicationStatistics(registry);
    /* your application */
    final MyApplication app = MyApplication.builder().statistics(statistics).build();

    final FastForwardReporter reporter = FastForwardReporter.forRegistry(registry).build()

    reporter.start();
    app.start();

    app.join();
    System.exit(0);
}

Metric Types

There are different metric types that can be used depending on what it is that we want to measure, e.g., queue length, or request time, etc.

Gauge

A gauge is an instantaneous measurement of a value. For example if we want to measure the number of pending jobs in a queue.

stry.register(metric.tagged("what", "job-queue-length"), new Gauge<Integer>() {
@Override
public Integer getValue() {
    // fetch the queue length the way you like
    final int queueLength = 10;
    // obviously this is gonna keep reporting 10, but you know ;)
    return queueLength;
}

In addition to the tags that are specified (e.g., “what” in this example), FfwdReporter adds the following tags to each Gauge data point:

| tag | values | comment | |————-|———|———| | metric_type | gauge | |

Counter

A counter is just a gauge for an AtomicLong instance. You can increment or decrement its value.

For example we want a more efficient way of measuring the pending job in a queue.

l Counter counter = registry.counter(metric.tagged("what", "job-count"));
omewhere in your code where you are adding new jobs to the queue you increment the counter as well
ter.inc();
omewhere in your code the job is going to be removed from the queue you decrement the counter
ter.dec();

In addition to the tags that are specified (e.g., “what” in this example), FfwdReporter adds the following tags to each Counter data point:

| tag | values | comment | |————-|———|———| | metric_type | counter | |

Meter

A meter measures the rate of events over time (e.g., “requests per second”). In addition to the mean rate, meters also track 1-, 5-, and 15-minute moving averages.

For example we have an endpoint that we want to measure how frequent we receive requests for it.

r meter = registry.meter(metric.tagged("what", "incoming-requests").tagged("endpoint", "/v1/list"));
ow a request comes and it's time to mark the meter
r.mark();

In addition to the tags that are specified (e.g., “what” and “endpoint” in this example), FfwdReporter adds the following tags to each Meter data point:

| tag | values | comment | |————-|———-|———| | metric_type | meter | | | unit | /s | is what is originally specified as “unit” attribute during declaration. If missing, the value will be set as “n/s”. For example if you originally specify .tagged(“unit”, “request”) on a Meter, FfwdReporter emits Meter data points with “unit”:“request/s” | | stat | 1m, 5m | 1m means the size of the time bucket of the calculated moving average of this data point is 1 minute. 5m means 5 minutes. |

Histogram

A histogram measures the statistical distribution of values in a stream of data. In addition to minimum, maximum, mean, etc., it also measures median, 75th, 90th, 95th, 98th, 99th, and 99.9th percentiles.

For example this histogram will measure the size of responses in bytes.

ogram histogram = registry.histogram(metric.tagged("what", "response-size").tagged("endpoint", "/v1/content"));
etch the size of the response
l long responseSize = getResponseSize(response);
ogram.update(responseSize);

In addition to the tags that are specified (e.g., “what” and “endpoint” in this example), FfwdReporter adds the following tags to each Histogram data point:

| tag | values | comment | |————-|——————————–|——————————-| | metric_type | histogram | | | stat | min, max, mean, median, stddev, p75, p99 |min: the lowest value in the snapshot
max: the highest value in the snapshot
mean: the arithmetic mean of the values in the snapshot
median: the median value in the distribution
stddev: the standard deviation of the values in the snapshot
p75: the value at the 75th percentile in the distribution
p99: the value at the 99th percentile in the distribution |

Note that added custom percentiles will show up in the stat tag.

Timer

A timer measures both the rate that a particular piece of code is called and the distribution of its duration.

For example we want to measure the rate and handling duration of incoming requests.

r timer = registry.timer(metric.tagged("what", "incoming-request-time").tagged("endpoint", "/v1/get_stuff"));
o this before starting to do the thing. This creates a measurement context object that you can pass around.
l Context context = timer.time();
uff();
ell the context that it's done. This will register the duration and counts one occurrence.
ext.stop();

In addition to the tags that are specified (e.g., “what” and “endpoint” in this example), FfwdReporter adds the following tags to each Timer data point:

| tag | values | comment | |————-|——————————–|——————————-| | metric_type | timer | | | unit | ns | | NOTE: Timer is really just a combination of a Histogram and a Meter, so apart from the tags above, combination of both Histogram and Meter tags will be included.

Why Semantic Metrics?

When dealing with thousands of similar timeseries over thousands of hosts, classification becomes a big issue.

Classical systems organize metric names as strings, containing a lot of information about the metric in question.

You will often see things like `webserver.host.example.com.df.used./`.

The same metric expressed as a set of tags could look like.

le": "webserver", "host": "host.example.com", "what": "disk-used",
untpoint": "/"}

This system of classification from the host machine greatly simplifies any metrics pipeline. When transported with a stable serialization method (like JSON) it does not matter if we add additional tags, or decide to change the order in which the timeseries happens to be designated.

We can also easily index this timeseries by its tag using a system like ElasticSearch and ask it interesting questions about which timeseries are available.

If used with a metrics backend that supports efficient aggregation and filtering across tags you gain a flexible and intionistic pipeline that is powerful and agnostic about what it sends, all the way from the service being monitored to your metrics GUI.

Contributing

This project adheres to the Open Code of Conduct. By participating, you are expected to honor this code.

  1. Fork semantic-metrics from github and clone your fork.
  2. Hack.
  3. Push the branch back to GitHub.
  4. Send a pull request to our upstream repo.

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.