voxpupuli/puppet-prometheus

Name: puppet-prometheus

Owner: Vox Pupuli

Description: Puppet module for prometheus

Forked from: bastelfreak/puppet-prometheus2

Created: 2016-02-29 15:20:39.0

Updated: 2017-11-30 22:05:00.0

Pushed: 2018-01-17 16:50:32.0

Homepage: null

Size: 289

Language: Puppet

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

puppet-prometheus

Puppet Forge Puppet Forge Puppet Forge

Compatibility

| Prometheus Version | Recommended Puppet Module Version | | —————- | ———————————– | | >= 0.16.2 | latest |

node_exporter >= 0.15.0 consul_exporter >= 0.3.0

Background

This module automates the install and configuration of Prometheus monitoring tool: Prometheus web site

What This Module Affects
Usage

To set up a prometheus daemon: On the server (for prometheus version < 1.0.0):

s { '::prometheus':
obal_config  => { 'scrape_interval'=> '15s', 'evaluation_interval'=> '15s', 'external_labels'=> { 'monitor'=>'master'}},
le_files     => [ "/etc/prometheus/alert.rules" ],
rape_configs => [
 { 'job_name'=> 'prometheus',
   'scrape_interval'=> '10s',
   'scrape_timeout' => '10s',
   'target_groups'  => [
    { 'targets'     => [ 'localhost:9090' ],
        'labels'    => { 'alias'=> 'Prometheus'}
     }
  ]
}


On the server (for prometheus version >= 1.0.0):

s { 'prometheus':
version => '1.0.0',
scrape_configs => [ {'job_name'=>'prometheus','scrape_interval'=> '30s','scrape_timeout'=>'30s','static_configs'=> [{'targets'=>['localhost:9090'], 'labels'=> { 'alias'=>'Prometheus'}}]}],
extra_options => '-alertmanager.url http://localhost:9093 -web.console.templates=/opt/prometheus-1.0.0.linux-amd64/consoles -web.console.libraries=/opt/prometheus-1.0.0.linux-amd64/console_libraries',
localstorage => '/prometheus/prometheus',

On the server (for prometheus version >= 2.0.0):

s { '::prometheus':
version        => '2.0.0',
alerts => { 'groups' => [{ 'name' => 'alert.rules', 'rules' => [{ 'alert' => 'InstanceDown', 'expr' => 'up == 0', 'for' => '5m', 'labels' => { 'severity' => 'page', }, 'annotations' => { 'summary' => 'Instance {{ $labels.instance }} down', 'description' => '{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes.' }}]}]},
scrape_configs => [
  { 'job_name' => 'prometheus',
    'scrape_interval' => '10s',
    'scrape_timeout'  => '10s',
    'static_configs'  => [
    { 'targets' => [ 'localhost:9090' ],
      'labels'  => { 'alias' => 'Prometheus'}
   }
  ]
}


or simply:

ude ::prometheus

To add alert rules, add the following to the class prometheus in case you are using prometheus < 2.0:

alerts => [{ 'name' => 'InstanceDown', 'condition' => 'up == 0', 'timeduration' => '5m', labels => [{ 'name' => 'severity', 'content' => 'page'}], 'annotations' => [{ 'name' => 'summary', content => 'Instance {{ $labels.instance }} down'}, {'name' => 'description', content => '{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes.' }]}]

or in hiera:

trules:
-
    name: 'InstanceDown'
    condition:  'up == 0'
    timeduration: '5m'
    labels:
        -
            name: 'severity'
            content: 'critical'
    annotations:
        -
            name: 'summary'
            content: 'Instance {{ $labels.instance }} down'
        -
            name: 'description'
            content: '{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes.'

When using prometheus >= 2.0, we use the new yaml format (https://prometheus.io/docs/prometheus/2.0/migration/#recording-rules-and-alerts) configuration

alerts => { 'groups' => [{ 'name' => 'alert.rules', 'rules' => [{ 'alert' => 'InstanceDown', 'expr' => 'up == 0', 'for' => '5m', 'labels' => { 'severity' => 'page', }, 'annotations' => { 'summary' => 'Instance {{ $labels.instance }} down', 'description' => '{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes.' } }]}]},
aml
ts:
oups:
- name: alert.rules
  rules:
  - alert: 'InstanceDown'
    expr: 'up == 0'
    for: '5m'
    labels:
      'severity': 'page'
    annotations:
      'summary': 'Instance {{ $labels.instance }} down'
      'description': '{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes.'

On the monitored nodes:

nclude prometheus::node_exporter

or:

s { 'prometheus::node_exporter':
version => '0.12.0',
collectors_disable => ['loadavg','mdadm' ],
extra_options => '--collector.ntp.server ntp1.orange.intra',

For more information regarding class parameters please take a look at class docstring.

Example

Real Prometheus >=2.0.0 setup example including alertmanager and slack_configs.

ude profiles::prometheus

s { '::prometheus':
rsion => '2.0.0',
erts => { 'groups' => [{ 'name' => 'alert.rules', 'rules' => [{ 'alert' => 'InstanceDown', 'expr' => 'up == 0', 'for' => '5m', 'labels' => { 'severity' => 'page', }, 'annotations' => { 'summary' => 'Instance {{ $labels.instance }} down', 'description' => '{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes.' } }]}]},
rape_configs => [
{ 'job_name' => 'prometheus',
  'scrape_interval' => '10s',
  'scrape_timeout'  => '10s',
  'static_configs'  => [
     { 'targets' => [ 'localhost:9090' ],
       'labels'  => { 'alias'=> 'Prometheus'}
     }
  ]
},
{ 'job_name' => 'node',
  'scrape_interval' => '5s',
  'scrape_timeout'  => '5s',
  'static_configs'  => [
     { 'targets' => [ 'nodexporter.domain.com:9100' ],
       'labels'  => { 'alias'=> 'Node'}
     }
  ]
}

ertmanagers_config => [{ 'static_configs' => [{'targets' => [ 'localhost:9093' ]}]}],

s { '::prometheus::alertmanager':
rsion       => '0.13.0',
ute         => { 'group_by' => [ 'alertname', 'cluster', 'service' ], 'group_wait'=> '30s', 'group_interval'=> '5m', 'repeat_interval'=> '3h', 'receiver'=> 'slack' },
ceivers     => [ { 'name' => 'slack', 'slack_configs'=> [ { 'api_url'=> 'https://hooks.slack.com/services/ABCDEFG123456', 'channel' => '#channel', 'send_resolved' => true, 'username' => 'username'}] }]

the same in hiera

theus::version: '2.0.0'
etheus::scrape_configs:
- job_name: 'nodexporter'
  scrape_interval:  '10s'
  scrape_timeout: '10s'
  static_configs:
  - targets:
    - nodexporter.domain.com:9100
    labels:
      alias: 'nodexporter'
- job_name: prometheus
  scrape_interval: 10s
  scrape_timeout: 10s
  static_configs:
  - targets:
    - localhost:9090
    labels:
      alias: Prometheus
etheus::alerts:
oups:
- name: alert.rules
  rules:
  - alert: 'InstanceDown'
    expr: 'up == 0'
    for: '5m'
    labels:
      'severity': 'page'
    annotations:
      'summary': 'Instance {{ $labels.instance }} down'
      'description': '{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes.'
etheus::alertmanagers_config:
static_configs:
- targets:
  - localhost:9093

etheus::alertmanager::version: '0.13.0'
etheus::alertmanager::route:
oup_by:
alertname
cluster
service
oup_wait: 30s
oup_interval: 5m
peat_interval: 3h
ceiver: slack
etheus::alertmanager::receivers:
name: slack
slack_configs:
- api_url: https://hooks.slack.com/services/ABCDEFG123456
  channel: "#channel"
  send_resolved: true
  username: username

Test you commit with vagrant https://github.com/kalinux/vagrant-puppet-prometheus.git

Limitations/Known issues

In version 0.1.14 of this module the alertmanager was configured to run as the service alert_manager. This has been changed in version 0.2.00 to be alertmanager.

Do not use version 1.0.0 of Prometheus: https://groups.google.com/forum/#!topic/prometheus-developers/vuSIxxUDff8 ; it does break the compatibility with thus module!

Even if the module has templates for several linux distributions, only RH family distributions were tested.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.