etsy/opsweekly

Name: opsweekly

Owner: Etsy, Inc.

Description: On call alert classification and reporting

Created: 2014-06-16 11:27:05.0

Updated: 2018-01-02 18:04:00.0

Pushed: 2017-12-06 21:44:11.0

Homepage: null

Size: 3191

Language: JavaScript

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Opsweekly Build Status

Deploy

What is Opsweekly?

Opsweekly is a weekly report tracker, an on call categorisation and reporting tool, a sleep tracker, a meeting organiser and a coffee maker all in one.

The goal of Opsweekly is to both organise your team into one central place, but also helps you understand and improve your on call rotations through the use of a simple on call “survey”, and reporting as a result of that tracking.

Alert classification is a complicated task, but with Opsweekly a few simple questions about each alert received can pay dividends in improving the on call experience for your engineers.

Features
Screenshots
Please visit the screenshot README for a guided tour of how Opsweekly works and the reports it can generate!
Prerequisites
Installation/configuration
  1. Download/clone the repo into an appropriate folder either in your webservers directory or symlinked to it. or:
  2. Create a configuration in your webserver for Opsweekly, if using it as a seperate domain (e.g. VirtualHost)
  3. You must increase the PHP variable max_input_vars for submitting on-call reports. See Increasing max input vars
  4. Create a MySQL database for opsweekly, and optionally grant a new user access to it. E.g.:
  5. mysql> create database opsweekly;
  6. mysql> grant all on opsweekly.* to opsweekly_user@localhost IDENTIFIED BY 'my_password';
  7. Load the database schema into MySQL, e.g. mysql -u opsweekly_user opsweekly < opsweekly.sql
  8. Teach Opsweekly how to authenticate your users.
  9. Move phplib/config.php.example to phplib/config.php, edit with your favourite editor (more detail below)
  10. Load Opsweekly in your browser
  11. Reward yourself with a refreshing beverage.
Upgrading

We're careful to only allow changes that should be backwards compatible with previous versions of opsweekly, e.g. if a new configuration value is added, a sensible default is included, etc.

Having said that, sometimes database schema changes are required. The script upgrade_db.php will attempt to alter your tables for those schema changes; if it fails, you can copy and paste the SQL and run manually. Re-running the upgrade_db.php more than once will not break your database.

Commiters/Maintainers: If you add a new database column, please add your schema change to upgrade_db.php so existing users can enjoy the features you add!

Providers/Plugins

Opsweekly uses the concept of “providers” for the various pieces of data it needs. These are like plugins and can vary from team to team.

The following providers are used:

The theory behind the providers mean if Opsweekly is not pulling data from a service you're currently using, it should be trivial to write your own and plug them in. Generally providers have two sets of configuration: One global for your entire instance, and then one config per team (or user, in the case of sleep)

For more information about how to configure the providers or to write your own, please see the documentation in each of the provider directories mentioned above.

Configuration

The config.php.example contains an example configuration to get you on your way. It's fairly well commented to explain the common options, but we'll go into more depth here:

Authenticating with Opsweekly

It's very important that Opsweekly knows who everyone who uses Opsweekly is, so the first step of using Opsweekly is to teach it how to understand who people are.

In config.php, there is the important function, getUsername. This function must return the username, for example, “ldenness”. You can write whatever PHP you like here; perhaps your SSO passes a HTTP header, or sets a cookie you can read to get the username.

The config.php.example has a couple of examples, one that will use the username from HTTP Basic Auth that can be configured with Apache.

Increasing max input vars

PHP has a default limit of the number of variables that can be input via form submission. Because compiling and submitting the on-call report is essentially just submitting a giant form, you must increase this value or your reports will be truncated!

Look for the configuration option max_input_vars in your PHP configuration (e.g. php.ini) or if you have your own Virtualhost (e.g. in Apache) you can do something like: php_value max_input_vars 10000 to increase the limit.

We highly suggest increasing to 10000 for future proofing your on-call reports. There's no real downside to this if you're limiting it to Opsweekly. The limit is to try and protect against exploits by hash collisions (basically, someone DoS-ing forms on your site). But you should not run Opsweekly exposed on the internet anyway.

Teams configuration

Opsweekly has the ability to support many different teams using the same codebase, if required. Each team gets it's own “copy” of the UI at a unique URL, and their data is stored in a seperate database.

Even if you only intend to use one team, the $teams array contains most of the important configuration for Opsweekly.

The key of the array(s) in the $teams array is the FQDN that you will access Opsweekly via, e.g. opsweekly.mycompany.com.

Inside this array are many configuration options:

You can have as many teams as you want in the $teams array, they just need to have unique FQDNs.

Weekly “hint” provider configuration

In this section you define and confgure the available weekly hint providers. These are displayed on the right hand side of the “Add” page so people have some information infront of them about what they did for a prompt to write their updates.

Of course, you are free to write your own that suits your needs. If you wish to do so, please see the documentation inside of the providers/weekly folder.

The $weekly_providers array handles the definition and configuring of the plugins in the providers/weekly folder. The array key should be a simple name of your provider, e.g. “github”. This name is referred to in the teams configuration under weekly_hints. Then as values inside the array, the following are required:

On call provider configuration

In this section you define and configure the available on call notification providers. On call providers are plugins that given a time period and a username (and the configuration we will enter both here and in the team configuration) will fetch all the notifications the person received in that time period, so they can classify the alerts.

Of course, you are free to write your own that suits your needs. If you wish to do so, please see the documentation inside of the providers/oncall folder.

The $oncall_providers array handles the definition and configuring of the plugins in the providers/oncall folder. The array key should be a simple name of your provider, e.g. “pagerduty”. This name is referred to inside the teams configuration in the on call section as provider. Then as values inside the array, the following are required:

Sleep provider configuration

In this section you can define and configure the sleep providers that users can choose in their “Edit Profile” screen. Sleep providers are plugins that given a unix timestamp, will return data on the sleep state of the user (for example, were they asleep and how deep asleep were they, and did they/how long did it take for them to get back to sleep)

We use this data to generate interesting reports about how on call rotations are affecting engineers sleep patterns, and help the team try and improve this required practice. For example, by listing alerts that most woke engineers, you could make a concious decision to wait to send that alert until morning, if it's not urgent enough.

The data is only stored alongside the notifications in the MySQL database, never shared.

Of course, you are free to write your own that suits your needs. If you wish to do so, please see the documentation inside of the providers/sleep folder.

The $sleep_providers array handles the definition and configuring of the plugins in the providers/sleep folder. The array key should be a simple name of your provider. The values must include the following:

Generic configuration

There are a few other configuration options, which are documented in the example config file. Some highlights include:

A note on on-call classification and categorisation

One of Opsweekly's core goals is to try and assist with thinking deeply about on call rotations and the notifications received during them.

A big part of this is requiring the on call engineer to categorise every alert they receive. If they receive 50+, this can be a daunting task.

We spent a long time trying to come up with a good balance of concise options to choose from, that provided a sufficient amount of detail but at the same time didn't overwhelm the user.

This is the list that we came up with:


Two types: Action/No Action

The main thing we wanted to record was whether an alert was actionable, or not actionable (e.g. was there a genuine problem that affected service of the system that the user had to intervene to fix)

Therefore, the alert categorisations are broken down into those two categories.

Action Taken Tags

The following are “Action Taken” tags, and their brief description:

No Action Taken Tags

The following are “No Action Taken” tags, and their brief description:


More than once engineers have been somewhat baffled by these choices, and asked for another option; but actaully, in all those cases it ended up being covered by another choice. This is actually a good thing, as it forces everyone to think about the intitial cause of the alert, rather than things wrapped up around it.

Hopefully by using Opsweekly you can become very aware of the kind of alerts your engineers are receiving, and then work to reduce noise and only wake/context switch your intelligent humans to do jobs when they're really required to do so.

Setting up meeting reminders

You can have opsweekly automatically email and IRC message you to remind you about meeting time, and provide the permalink to this week's meeting for convenience.

To do so, simply set up a cron (or other method of triggering script e.g. manually) with the following: php /path/to/opsweekly/send_meeting_reminder.php <your-configured-cname>

e.g., using cron, weekly at 2pm: 0 14 * * 3 php /var/www/opsweekly/send_meeting_reminder.php myweekly.yourdomain.com

Known issues/caveats/future goals

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.