Name: opsweekly
Owner: Etsy, Inc.
Description: On call alert classification and reporting
Created: 2014-06-16 11:27:05.0
Updated: 2018-01-02 18:04:00.0
Pushed: 2017-12-06 21:44:11.0
Homepage: null
Size: 3191
Language: JavaScript
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
Opsweekly is a weekly report tracker, an on call categorisation and reporting tool, a sleep tracker, a meeting organiser and a coffee maker all in one.
The goal of Opsweekly is to both organise your team into one central place, but also helps you understand and improve your on call rotations through the use of a simple on call “survey”, and reporting as a result of that tracking.
Alert classification is a complicated task, but with Opsweekly a few simple questions about each alert received can pay dividends in improving the on call experience for your engineers.
phplib/config.php
, define oncall_sleep_retrospective_count
with a numeric value (such as 3). Users viewing their profile will then see how past weeks affected their sleep.max_input_vars
for submitting on-call reports. See Increasing max input varsmysql> create database opsweekly;
mysql> grant all on opsweekly.* to opsweekly_user@localhost IDENTIFIED BY 'my_password';
mysql -u opsweekly_user opsweekly < opsweekly.sql
We're careful to only allow changes that should be backwards compatible with previous versions of opsweekly, e.g. if a new configuration value is added, a sensible default is included, etc.
Having said that, sometimes database schema changes are required. The script upgrade_db.php
will attempt to
alter your tables for those schema changes; if it fails, you can copy and paste the SQL and run manually.
Re-running the upgrade_db.php
more than once will not break your database.
Commiters/Maintainers: If you add a new database column, please add your schema change to upgrade_db.php
so existing users can enjoy the features you add!
Opsweekly uses the concept of “providers” for the various pieces of data it needs. These are like plugins and can vary from team to team.
The following providers are used:
providers/weekly/
: These are known as weekly “hints” which are used to helpfully hint or remind people what they did in the last week when writing their reports.providers/oncall/
: These are used to pull in notifications from somewhere for the on call engineer to document.providers/sleep
: These are used to query an external datasource to establish whether the on call engineer was asleep during the notifications he or she received.The theory behind the providers mean if Opsweekly is not pulling data from a service you're currently using, it should be trivial to write your own and plug them in. Generally providers have two sets of configuration: One global for your entire instance, and then one config per team (or user, in the case of sleep)
For more information about how to configure the providers or to write your own, please see the documentation in each of the provider directories mentioned above.
The config.php.example contains an example configuration to get you on your way. It's fairly well commented to explain the common options, but we'll go into more depth here:
It's very important that Opsweekly knows who everyone who uses Opsweekly is, so the first step of using Opsweekly is to teach it how to understand who people are.
In config.php
, there is the important function, getUsername
. This function must return the username, for example, “ldenness”.
You can write whatever PHP you like here; perhaps your SSO passes a HTTP header, or sets a cookie you can read to get the username.
The config.php.example
has a couple of examples, one that will use the username from HTTP Basic Auth that can be configured with Apache.
PHP has a default limit of the number of variables that can be input via form submission. Because compiling and submitting the on-call report is essentially just submitting a giant form, you must increase this value or your reports will be truncated!
Look for the configuration option max_input_vars
in your PHP configuration (e.g. php.ini) or if you have your own Virtualhost (e.g. in Apache) you can do something like: php_value max_input_vars 10000
to increase the limit.
We highly suggest increasing to 10000 for future proofing your on-call reports. There's no real downside to this if you're limiting it to Opsweekly. The limit is to try and protect against exploits by hash collisions (basically, someone DoS-ing forms on your site). But you should not run Opsweekly exposed on the internet anyway.
Opsweekly has the ability to support many different teams using the same codebase, if required. Each team gets it's own “copy” of the UI at a unique URL, and their data is stored in a seperate database.
Even if you only intend to use one team, the $teams
array contains most of the important configuration for Opsweekly.
The key of the array(s) in the $teams
array is the FQDN that you will access Opsweekly via, e.g. opsweekly.mycompany.com
.
Inside this array are many configuration options:
display_name
: The “friendly” or display name for your team is used throughout the UI to describe your team. For example, “Ops”root_url
: If your installation is on a path other than “/“, enter the path here. For example, if your desired URL is “http://intranet.mycompany.com/opsweekly” would enter “/opsweekly”.email_report_to
: The email address of the mailing list your team uses to communicate, used for sending weekly reports (if the person requests it) or any other email communication.database
: The name of the MySQL database Opsweekly will try and use for this teamoncall
: Either false
or another array containing configuration regarding your on call rotations.provider
: Which on call provider you wish to use for this team to fetch information, for example “splunk”, “logstash” or “pagerduty”provider_options
: An array of team unique configuration options that this plugin requires. The list of these is available in the documentation for the provider itself. For example, Pageduty will require the service ID.timezone
: The PHP style timezone that this team operates in, or rather the timezone that your on call rotation starts in. A great example here is to take this (and the following two variables) directly from Pagerduty if you use that for scheduling your on call rotationsstart
: The time when your on call rotation starts. This is input into strtotime so it can be friendly text like “friday 18:00” for 6pm on Fridayend
: As above, except when your on call rotation ends.weekly_hints
: The weekly hint providers you wish to use for these team to prompt people to fill in their weekly reports. There are examples in the providers/weekly
folder, for example Github (pulling in recent commits) and JIRA (pulling in closed tickets)irc_channel
: The IRC channel your team uses. Used for various IRC integrations (currently just warning about weekly meeting time, if cron is set up)You can have as many teams as you want in the $teams
array, they just need to have unique FQDNs.
In this section you define and confgure the available weekly hint providers. These are displayed on the right hand side of the “Add” page so people have some information infront of them about what they did for a prompt to write their updates.
Of course, you are free to write your own that suits your needs. If you wish to do so, please see the documentation inside of the providers/weekly
folder.
The $weekly_providers
array handles the definition and configuring of the plugins in the providers/weekly
folder. The array key should be a simple name of your provider, e.g. “github”. This name is referred to in the teams configuration under weekly_hints
. Then as values inside the array, the following are required:
display_name
: Displayed above the output from your plugin, this is the friendly header name for your provider, e.g. “Last week's tickets”lib
: The path to the PHP file that contains your provider, e.g. providers/weekly/github.php
class
: The class name you're using for your weekly provider, which will be created if requested by the team configurationoptions
: An array of arbritrary key/value pairs that are passed into the provider when it's loaded, used for configuration that is to be shared between all teams. For example, a path to an API, or a username and password to login to an API.In this section you define and configure the available on call notification providers. On call providers are plugins that given a time period and a username (and the configuration we will enter both here and in the team configuration) will fetch all the notifications the person received in that time period, so they can classify the alerts.
Of course, you are free to write your own that suits your needs. If you wish to do so, please see the documentation inside of the providers/oncall
folder.
The $oncall_providers
array handles the definition and configuring of the plugins in the providers/oncall
folder. The array key should be a simple name of your provider, e.g. “pagerduty”. This name is referred to inside the teams configuration in the on call section as provider
. Then as values inside the array, the following are required:
display_name
: A friendly, display name for your provider (e.g. Pagerduty)lib
: The path to the PHP file that contains your provider code, e.g. providers/oncall/pagerduty.php
options
: An array of arbritrary key/value pairs that are passed into the provider when it's loaded, used for configuration that is to be shared between all teams. For example, a path to an API, or a username and password to login to an API.In this section you can define and configure the sleep providers that users can choose in their “Edit Profile” screen. Sleep providers are plugins that given a unix timestamp, will return data on the sleep state of the user (for example, were they asleep and how deep asleep were they, and did they/how long did it take for them to get back to sleep)
We use this data to generate interesting reports about how on call rotations are affecting engineers sleep patterns, and help the team try and improve this required practice. For example, by listing alerts that most woke engineers, you could make a concious decision to wait to send that alert until morning, if it's not urgent enough.
The data is only stored alongside the notifications in the MySQL database, never shared.
Of course, you are free to write your own that suits your needs. If you wish to do so, please see the documentation inside of the providers/sleep
folder.
The $sleep_providers
array handles the definition and configuring of the plugins in the providers/sleep
folder. The array key should be a simple name of your provider. The values must include the following:
display_name
: A friendly name to display on the UI of Opsweekly for this provider. E.g. “Jawbone UP”description
: A description of the sleep tracker, to differentiate it from otherslogo
: Please place a logo in an addressable location, e.g. in the /assets/sleep/
directory (30x30px) and place the URL path to it here.options
: An array of key/value pairs that will be used to display configuration options in the UI to users. Unlike other providers, sleep tracking is a per user subject, so configuration is entered via the “Edit Profile” screen, and stored in the database. Each option is parsed to create a HTML form input field. The key should be the option name. The following values are required:type
: The type of input field. Currently only text
is supported/tested.name
: The friendly “field name” for the input boxdescription
: The description of what the user shoud enter, displayed next to the input boxplaceholder
: Placeholder text displayed inside of the text boxlib
: The path to the PHP file that contains your provider code, e.g. providers/sleep/up.php
There are a few other configuration options, which are documented in the example config file. Some highlights include:
$mysql_host
, $mysql_user
, $mysql_pass
: Global configuration for your MySQL database. Per team database configuration (e.g. the database name to use) goes inside the team config.$email_from_domain
: The domain name you use to send email, used for a “From” address when sending weekly reports.$search_results_per_page
: Allows control of the number of search results returned at once$error_log_file
: Opsweekly prints some events, especially relating to on call fetching and Sleep tracking to a debug log file. This log file can be extremely useful at debugging provider issues.$dev_fqdn
, $prod_fqdn
: To allow ease of development, Opsweekly will preg_replace the hostname given to it to another hostname (which then matches your team names in the $teams
array).$irccat_hostname
, $irccat_port
: If you use irccat and wish to use meeting reminders, and have them appear in IRC, you will need to configure the hostname and port your irccat instance runs at here.One of Opsweekly's core goals is to try and assist with thinking deeply about on call rotations and the notifications received during them.
A big part of this is requiring the on call engineer to categorise every alert they receive. If they receive 50+, this can be a daunting task.
We spent a long time trying to come up with a good balance of concise options to choose from, that provided a sufficient amount of detail but at the same time didn't overwhelm the user.
This is the list that we came up with:
The main thing we wanted to record was whether an alert was actionable, or not actionable (e.g. was there a genuine problem that affected service of the system that the user had to intervene to fix)
Therefore, the alert categorisations are broken down into those two categories.
The following are “Action Taken” tags, and their brief description:
The following are “No Action Taken” tags, and their brief description:
More than once engineers have been somewhat baffled by these choices, and asked for another option; but actaully, in all those cases it ended up being covered by another choice. This is actually a good thing, as it forces everyone to think about the intitial cause of the alert, rather than things wrapped up around it.
Hopefully by using Opsweekly you can become very aware of the kind of alerts your engineers are receiving, and then work to reduce noise and only wake/context switch your intelligent humans to do jobs when they're really required to do so.
You can have opsweekly automatically email and IRC message you to remind you about meeting time, and provide the permalink to this week's meeting for convenience.
To do so, simply set up a cron (or other method of triggering script e.g. manually) with the following:
php /path/to/opsweekly/send_meeting_reminder.php <your-configured-cname>
e.g., using cron, weekly at 2pm:
0 14 * * 3 php /var/www/opsweekly/send_meeting_reminder.php myweekly.yourdomain.com