Name: satellite
Owner: Two Sigma
Description: Satellite monitors, alerts on, and self-heals your Mesos cluster.
Created: 2015-02-18 21:33:21.0
Updated: 2018-05-19 12:26:39.0
Pushed: 2016-05-09 18:24:55.0
Size: 379
Language: Clojure
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
Satellite monitors, alerts on, and self-heals your Mesos cluster.
Satellite currently serves three functions, each adding functionality to Mesos:
Most importantly, Satellite Master directly monitors Mesos masters and receives monitoring information from Mesos slaves through Satellite Slaves, producing a Riemann event stream for aggregate statistics of the cluster (e.g., utilization, number of tasks lost) as well as events specific to the masters (e.g., how many leaders are there).
As Satellite embeds Riemann, you can do anything you would usually do with a
Riemann event stream. You can configure to alert (e.g., email and pagerduty),
feed your dashboards, and trigger whitelist updates. If you already have a
primary Riemann server, you can forward events to that and contain all or
some of the stream eventing logic there; the power is yours. You can look at
our recipes in src/recipes.clj
for patterns we have found useful.
Satellite provides a REST interface for interacting with the Mesos master
whitelist. The whitelist is a text file of hosts to which the master will
consider sending tasks. Satellite ensures that update requests are consistent
across the Mesos masters. See Whitelist
below for a more detailed
explanation of the model for interacting with the whitelist.
Satellite can provide a REST interface for accessing cached Mesos task metadata. To be very clear, Satellite does not cache the metadata, simply an interface to retrieve it, if it has been cached. This is useful if you have persisted task metadata to get around the weak persistence guarantees currently offered by Mesos. This feature is optional.
As we said above, the Mesos whitelist is a text file of hosts to which the master will consider sending tasks. Satellite adds two additional conceptual whitelists to the mix:
A periodic merge operation merges these two to the whitelist file that Mesos observes.
There are two kinds of Satellite processes: satellite-master
s and
satellite-slave
s.
For each mesos-master
and mesos-slave
process, there is a satellite-master
and satellite-slave
process watching it respectively.
| Follower | Leader | Follower | |------------------|------------------|------------------| | mesos-master | mesos-master | mesos-master | | satellite-master | satellite-master | satellite-master | ^ <- ^ -> ^ | -> -> \ | / <- <- | | / _/ \|/ \,_ \ | | satellite-slave | satellite-slave | satellite-slave | | mesos-slave | mesos-slave | mesos-slave |
satellite-master
embeds Riemann and satellite-slave
embeds a Riemann client.
satellite-slave
s send one type of message to all the satellite-masters
, a
Riemann event that is the result of a user-specified test.
You can use Riemann's powerful stream processing DSL to act on the events your slave is sending the masters–email, alert, turn hosts on or off–it's up to you!
We've provided some very, very basic recipes and are interested in adding more, so please send a pull request if you think anything would serve the larger community.
If you're deploying Satellite to a Mesos cluster, you might find that installing and configuring it on all of your hosts involves some repetitive busy work. To help you automate these tasks, we've provided some Ansible Roles.
We need all contributors to fill out our Contributor License Agreement found in
/cla
before we can accept any code or pull requests.
Apache Mesos is a trademark of the Apache Software Foundation.