Name: ccpalert
Owner: CCP Games
Description: The alerting component of ccpmetrics
Created: 2015-07-09 14:20:37.0
Updated: 2015-09-22 10:50:05.0
Pushed: 2015-09-29 13:35:26.0
Homepage: null
Size: 359
Language: Go
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
CCP Alert is the alerting component of CCPMetrics. It provides a simple threshold based alerting service and can send alerts via Email and PagerDuty. Metrics are sent to CCP Alert via a REST API and are checked against alerting rules which have been previously defined via CCPAlertQL.
Alert rule are created via CCPAlertQL, a simple SQL inspired domain specific language.
Alert rules have a name which identifies them, a metric which they are associated, a condition which is the central part of the rule and text, which describes the rule. The query to create a new alert rule takes the following form:
T <alert name> IF <metric name> <operator> <threshold value> TEXT <description of alert>
The alert name is simply an identifier for the alert. The metric name is the metric which the alert corresponds to. The operator and threshold specify when the alert is triggered. Here are several more concrete examples:
T cpuOnFireAlert IF superImportantServer.cpuUsage > 100 TEXT "Critical production server is heavily loaded"
T noplayers IF tq.currentPlayers == 0 TEXT "something has gone badly wrong"
Once a rule has been created, they can be evaluated against data points. Data points can be passed to CCP Alert via the API or alternatively they be can be pulled from InfluxDB scheduled queries. A scheduled query is an InfluxDB query which CCP Alert executes at regular intervals and evaluates against stored alerting rules. A scheduled InfluxDB query must return a single value and single point, multi value or multi point queries are not accepted. E.g. the query:
ct cpufree, cpuused, cpuidle from host1.cpu
ct cpuidle from host1.cpu
ct * from host1.cpu
are not a valid scheduled query, where as:
ct max(cpuidle) from host1.cpu
ct last(cpufree) from host1.cpu
are acceptble scheduled queres. InfluxDB's aggregate functions are useful to return single points. So long as the query returns a single point and single value, the full range of InfluxQL functionality can be used. It is advisable to test your query via the InfluxDB web interfce, Chronograf or Grafana before scheduling it in CCP Alert.
A scheduled query is pased to CCP Alert by encapsulating it in a SCHEDULE statement. A schedule statement takes the form:
DULE <metric name> INFLUXDB <influx query> ON <influx database>
To provide specific examples:
DULE cpuOnFire INFLUXDB "select max(value) from host1.cpu where time > now() - 1h" ON public
DULE noplayers INFLUXDB "select last(value) from tq.currentPlayers" ON production
Note that the metric name in the schedule query should correspond to the metric name in the alert rule. Once a query is scheduled, the resulting metric will be checked against all corresponding alert rules.