ccpgames/ccpalert

Name: ccpalert

Owner: CCP Games

Description: The alerting component of ccpmetrics

Created: 2015-07-09 14:20:37.0

Updated: 2015-09-22 10:50:05.0

Pushed: 2015-09-29 13:35:26.0

Homepage: null

Size: 359

Language: Go

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

CCP Alert

CCP Alert is the alerting component of CCPMetrics. It provides a simple threshold based alerting service and can send alerts via Email and PagerDuty. Metrics are sent to CCP Alert via a REST API and are checked against alerting rules which have been previously defined via CCPAlertQL.

CCPAlertQL

Alert rule are created via CCPAlertQL, a simple SQL inspired domain specific language.

Creating Alert Rules

Alert rules have a name which identifies them, a metric which they are associated, a condition which is the central part of the rule and text, which describes the rule. The query to create a new alert rule takes the following form:

T <alert name> IF <metric name> <operator> <threshold value> TEXT <description of alert>

The alert name is simply an identifier for the alert. The metric name is the metric which the alert corresponds to. The operator and threshold specify when the alert is triggered. Here are several more concrete examples:

T cpuOnFireAlert IF superImportantServer.cpuUsage > 100 TEXT "Critical production server is heavily loaded"
T noplayers IF tq.currentPlayers == 0 TEXT "something has gone badly wrong"
Creating Scheduled Database Queries

Once a rule has been created, they can be evaluated against data points. Data points can be passed to CCP Alert via the API or alternatively they be can be pulled from InfluxDB scheduled queries. A scheduled query is an InfluxDB query which CCP Alert executes at regular intervals and evaluates against stored alerting rules. A scheduled InfluxDB query must return a single value and single point, multi value or multi point queries are not accepted. E.g. the query:

ct cpufree, cpuused, cpuidle from host1.cpu
ct cpuidle from host1.cpu
ct * from host1.cpu

are not a valid scheduled query, where as:

ct max(cpuidle) from host1.cpu
ct last(cpufree) from host1.cpu

are acceptble scheduled queres. InfluxDB's aggregate functions are useful to return single points. So long as the query returns a single point and single value, the full range of InfluxQL functionality can be used. It is advisable to test your query via the InfluxDB web interfce, Chronograf or Grafana before scheduling it in CCP Alert.

A scheduled query is pased to CCP Alert by encapsulating it in a SCHEDULE statement. A schedule statement takes the form:

DULE <metric name> INFLUXDB <influx query> ON <influx database>

To provide specific examples:

DULE cpuOnFire INFLUXDB "select max(value) from host1.cpu where time > now() - 1h" ON public
DULE noplayers INFLUXDB "select last(value) from tq.currentPlayers" ON production

Note that the metric name in the schedule query should correspond to the metric name in the alert rule. Once a query is scheduled, the resulting metric will be checked against all corresponding alert rules.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.