racker/statsd

Name: statsd

Owner: racker

Description: Simple daemon for easy stats aggregation

Created: 2012-09-05 20:42:44.0

Updated: 2013-12-20 15:54:46.0

Pushed: 2013-11-22 01:30:59.0

Homepage:

Size: 595

Language: JavaScript

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

StatsD Build Status

A network daemon that runs on the Node.js platform and listens for statistics, like counters and timers, sent over UDP and sends aggregates to one or more pluggable backend services (e.g., Graphite).

We (Etsy) blogged about how it works and why we created it.

Concepts
Counting
gorets:1|c

This is a simple counter. Add 1 to the “gorets” bucket. It stays in memory until the flush interval config.flushInterval.

Timing
glork:320|ms

The glork took 320ms to complete this time. StatsD figures out 90th percentile, average (mean), lower and upper bounds for the flush interval. The percentile threshold can be tweaked with config.percentThreshold.

The percentile threshold can be a single value, or a list of values, and will generate the following list of stats for each threshold:

stats.timers.$KEY.mean_$PCT stats.timers.$KEY.upper_$PCT

Where $KEY is the key you stats key you specify when sending to statsd, and $PCT is the percentile threshold.

Sampling
gorets:1|c|@0.1

Tells StatsD that this counter is being sent sampled every 1/10th of the time.

Gauges

StatsD now also supports gauges, arbitrary values, which can be recorded.

gaugor:333|g

All metrics can also be batch send in a single UDP packet, separated by a newline character.

Debugging

There are additional config variables available for debugging:

For more information, check the exampleConfig.js.

Supported Backends

StatsD supports multiple, pluggable, backend modules that can publish statistics from the local StatsD daemon to a backend service or data store. Backend services can retain statistics for longer durations in a time series data store, visualize statistics in graphs or tables, or generate alerts based on defined thresholds. A backend can also correlate statistics sent from StatsD daemons running across multiple hosts in an infrastructure.

StatsD includes the following backends:

By default, the graphite backend will be loaded automatically. To select which backends are loaded, set the backends configuration variable to the list of backend modules to load.

Backends are just npm modules which implement the interface described in section Backend Interface. In order to be able to load the backend, add the module name into the backends variable in your config. As the name is also used in the require directive, you can load one of the provided backends by giving the relative path (e.g. ./backends/graphite).

Graphite Schema

Graphite uses “schemas” to define the different round robin datasets it houses (analogous to RRAs in rrdtool). Here's an example for the stats databases:

In conf/storage-schemas.conf:

[stats]
pattern = ^stats\..*
retentions = 10:2160,60:10080,600:262974

In conf/storage-aggregation.conf:

[min]
pattern = \.min$
xFilesFactor = 0.1 
aggregationMethod = min 

[max]
pattern = \.max$
xFilesFactor = 0.1 
aggregationMethod = max 

[sum]
pattern = \.count$
xFilesFactor = 0 
aggregationMethod = sum 

[default_average]
pattern = .*
xFilesFactor = 0.3 
aggregationMethod = average

This translates to:

(Note: Newer versions of Graphite can take human readable time formats like 10s:6h,1min:7d,10min:5y)

Retentions and aggregations are read from the file in order, the first pattern that matches is used. This is set when the database is first created, changing these config files will not change databases that have already been created. To view or alter the settings on existing files, use whisper-info.py and whisper-resize.py included with the Whisper package.

These settings have been a good tradeoff so far between size-of-file (round robin databases are fixed size) and data we care about. Each “stats” database is about 3.2 megs with these retentions.

Many users have been confused to see their hit counts averaged, missing when the data is intermittent, or never stored when statsd is sending at a different interval than graphite expects. Storage aggregation settings will help you control this and understand what Graphite is doing internally with your data.

TCP Stats Interface

A really simple TCP management interface is available by default on port 8126 or overriden in the configuration file. Inspired by the memcache stats approach this can be used to monitor a live statsd server. You can interact with the management server by telnetting to port 8126, the following commands are available:

The stats output currently will give you:

Each backend will also publish a set of statistics, prefixed by its module name.

Graphite:

A simple nagios check can be found in the utils/ directory that can be used to check metric thresholds, for example the number of seconds since the last successful flush to graphite.

Installation and Configuration
Tests

A test framework has been added using node-unit and some custom code to start and manipulate statsd. Please add tests under test/ for any new features or bug fixes encountered. Testing a live server can be tricky, attempts were made to eliminate race conditions but it may be possible to encounter a stuck state. If doing dev work, a killall node will kill any stray test servers in the background (don't do this on a production machine!).

Tests can be executd with ./run_tests.sh.

Backend Interface

Backend modules are Node.js modules that listen for a number of events emitted from StatsD. Each backend module should export the following initialization function:

Backends can listen for the following events emitted by StatsD from the events object:

Inspiration

StatsD was inspired (heavily) by the project (of the same name) at Flickr. Here's a post where Cal Henderson described it in depth: Counting and timing. Cal re-released the code recently: Perl StatsD

Meta
Contribute

You're interested in contributing to StatsD? AWESOME. Here are the basic steps:

fork StatsD from here: http://github.com/etsy/statsd

  1. Clone your fork
  2. Hack away
  3. If you are adding new functionality, document it in the README
  4. If necessary, rebase your commits into logical chunks, without errors
  5. Push the branch up to GitHub
  6. Send a pull request to the etsy/statsd project.

We'll do our best to get your changes in!

Contributors

In lieu of a list of contributors, check out the commit history for the project: https://github.com/etsy/statsd/graphs/contributors


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.