elodina/exhibitor-mesos-framework

Name: exhibitor-mesos-framework

Owner: Elodina

Description: Exhibitor on Apache Mesos for reliably running Zookeeper on Mesos

Created: 2015-08-03 20:56:23.0

Updated: 2016-05-16 14:41:32.0

Pushed: 2016-05-18 12:30:45.0

Homepage: null

Size: 804

Language: Scala

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Exhibitor Mesos Framework

Prerequisites

Typical Operations

Navigating the CLI

Prerequisites

Clone and build the project

# git clone https://github.com/elodina/exhibitor-mesos-framework.git
# cd exhibitor-mesos-framework
# ./gradlew jar

Build Exhibitor Standalone if necessary (NOTE: version built with Gradle may be affected by this issue so we use Maven build in this example):

# mkdir tmp-exhibitor && cd tmp-exhibitor
# wget https://raw.github.com/Netflix/exhibitor/master/exhibitor-standalone/src/main/resources/buildscripts/standalone/maven/pom.xml
# mvn clean package
# cp target/exhibitor-*.jar ..
# cd .. && rm -rf tmp-exhibitor

Download Apache Zookeeper distribution if you don't have one (or place the archive to the working folder):

# wget http://apache.cp.if.ua/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz

Download Oracle JDK distribution (or place the archive to the working folder). NOTE: please pay attention it MUST be Oracle JDK (not OpenJDK and not JRE) as Exhibitor relies on jps calls:

# wget --no-check-certificate --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u45-b14/jdk-8u45-linux-x64.tar.gz
Environment Configuration

Before running ./exhibitor-mesos.sh, set the location of libmesos:

# export MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so

If the host running scheduler has several IP addresses you may also need to

# export LIBPROCESS_IP=<IP_ACCESSIBLE_FROM_MASTER>
Scheduler Configuration

The scheduler is configured through the command line.

Following options are available:

e: scheduler [options]

 <value> | --master <value>
    Mesos Master addresses. Required.
 <value> | --api <value>
    Binding host:port for http/artifact server. Optional if EM_API env is set.
 <value> | --user <value>
    Mesos user. Required.
framework-name <value>
    Mesos framework name. Defaults to exhibitor. Optional
framework-timeout <value>
    Mesos framework failover timeout. Allows to recover from failure before killing running tasks. Should be a parsable Scala Duration value. Defaults to 30 days. Optional
storage <value>
    Storage for cluster state. Examples: file:exhibitor-mesos.json; zk:master:2181/exhibitor-mesos. Required.
ensemble-modify-retries <value>
    Number of retries to modify (add/remove server) ensemble. Defaults to 60. Optional.
ensemble-modify-backoff <value>
    Backoff between retries to modify (add/remove server) ensemble in milliseconds. Defaults to 1000. Optional.
 <value> | --debug <value>
    Debug mode. Optional. Defaults to false.
Run the scheduler

Start the Exhibitor scheduler using this command:

# ./exhibitor-mesos.sh scheduler --master master:5050 --user root --api http://master:6666
Quick start

In order not to pass the API url to each CLI call lets export the URL as follows:

port EM_API=http://master:6666

First lets start 1 Exhibitor with the default settings. Further in the readme you can see how to change these from the defaults.

exhibitor-mesos.sh add 0
d servers 0

ter:
rver:
id: 0
state: Added
constraints: hostname=unique
failover: delay:1m, max-delay:10m, max-tries:2
stickiness: period: 10m
exhibitor config:
shared config overrides:
cpu: 0.2
mem: 256.0
sharedConfigChangeBackoff: 10000
port: auto

You now have a cluster with 1 server that is not started.

exhibitor-mesos.sh status
ter:
rver:
id: 0
state: Added
constraints: hostname=unique
failover: delay:1m, max-delay:10m, max-tries:2
stickiness: period: 10m
exhibitor config:
shared config overrides:
cpu: 0.2
mem: 256.0
sharedConfigChangeBackoff: 10000
port: auto

Each server requires some basic configuration.

exhibitor-mesos.sh config 0 --configtype zookeeper --zkconfigconnect 192.168.3.1:2181 --zkconfigzpath /exhibitor/config --zookeeper-install-directory /tmp/zookeeper --zookeeper-data-directory /tmp/zkdata
ted configuration for servers 0

ter:
rver:
id: 0
state: Added
constraints: hostname=unique
failover: delay:1m, max-delay:10m, max-tries:2
stickiness: period: 10m
exhibitor config:
  zkconfigzpath: /exhibitor/config
  zkconfigconnect: 192.168.3.1:2181
  configtype: zookeeper
shared config overrides:
  zookeeper-install-directory: /tmp/zookeeper
  zookeeper-data-directory: /tmp/zkdata
cpu: 0.2
mem: 256.0
sharedConfigChangeBackoff: 10000
port: auto

Now lets start the server. This call to CLI will block until the server is actually started but will wait no more than a configured timeout. Timeout can be passed via --timeout flag and defaults to 60s. If a timeout of 0ms is passed CLI won't wait for servers to start at all and will reply with “Scheduled servers …” message.

exhibitor-mesos.sh start 0 --timeout 30s
ted servers 0

ter:
rver:
id: 0
state: Running
constraints: hostname=unique
failover: delay:1m, max-delay:10m, max-tries:2
stickiness: period: 10m, hostname:slave1
exhibitor config:
  zkconfigzpath: /exhibitor/config
  zkconfigconnect: 192.168.3.1:2181
  configtype: zookeeper
shared config overrides:
  zookeeper-install-directory: /tmp/zookeeper
  zookeeper-data-directory: /tmp/zkdata
cpu: 0.2
mem: 256.0
sharedConfigChangeBackoff: 10000
port: auto

Now as we don't know where the server we may ask for the cluster status to see where the endpoint is.

exhibitor-mesos.sh status
ter:
rver:
id: 0
state: Running
endpoint: http://slave0:31000/exhibitor/v1/ui/index.html
constraints: hostname=unique
failover: delay:1m, max-delay:10m, max-tries:2
stickiness: period: 10m, hostname:slave1
exhibitor config:
  zkconfigzpath: /exhibitor/config
  zkconfigconnect: 192.168.3.1:2181
  port: 31000
  configtype: zookeeper
shared config overrides:
  zookeeper-install-directory: /tmp/zookeeper
  zookeeper-data-directory: /tmp/zkdata
cpu: 0.2
mem: 256.0
sharedConfigChangeBackoff: 10000
port: auto
exhibitor cluster view:
      [slave0, latent, 0, F]

(NOTE: with exhibitor cluster view section you can reason about underlying Exhibitor and Zookeeper ensemble. Since there is some synchronisation lag in Exhibitor when the node is added/removed, the view of the cluster may be different from different nodes, that's why this section is shown under all nodes that are in the RUNNING state)

By now you should have a single Exhibitor instance running. Here's how you stop it:

exhibitor-mesos.sh stop 0
ped servers 0

If you want to remove the server from the cluster completely you may skip stop step and call remove directly (this will call stop under the hood anyway):

hibitor-mesos.sh remove 0
ved servers 0

Typical Operations

Changing the location of Zookeeper data
exhibitor-mesos.sh stop 0
ped servers 0

exhibitor-mesos.sh config 0 --zookeeper-data-directory /tmp/exhibitor_zkdata
ted configuration for servers 0

ter:
rver:
id: 0
state: Added
constraints: hostname=unique
failover: delay:1m, max-delay:10m, max-tries:2
stickiness: period: 10m
exhibitor config:
  zkconfigzpath: /exhibitor/config
  zkconfigconnect: 192.168.3.1:2181
  configtype: zookeeper
shared config overrides:
  zookeeper-install-directory: /tmp/zookeeper
  zookeeper-data-directory: /tmp/exhibitor_zkdata
cpu: 0.2
mem: 256.0
sharedConfigChangeBackoff: 10000
port: auto
Shutting down framework

While the scheduler has a shutdown hook it doesn't actually finish the framework. To shutdown the framework completely (e.g. unregister it in Mesos) you may shoot a POST to /teardown specifying the framework id to shutdown:

rl -d frameworkId=20150807-094500-84125888-5050-14187-0005 -X POST http://master:5050/teardown

Navigating the CLI

Requesting help
exhibitor-mesos.sh help
e: <command>

ands:
lp       - print this message.
lp [cmd] - print command-specific help.
heduler  - start scheduler.
atus     - print cluster status.
d        - add servers to cluster.
nfig     - configure servers in cluster.
art      - start servers in cluster.
op       - stop servers in cluster.
move     - remove servers in cluster.
Adding servers to the cluster
exhibitor-mesos.sh help add
e: add <id> [options]

 <value> | --cpu <value>
    CPUs for server. Optional.
 <value> | --mem <value>
    Memory for server. Optional.
constraints <value>
    Constraints (hostname=like:master,rack=like:1.*). See below. Defaults to 'hostname=unique'. Optional.
 <value> | --configchangebackoff <value>
    Backoff between checks whether the shared configuration changed in milliseconds. Defaults to 10000. Optional.
 <value> | --api <value>
    Binding host:port for http/artifact server. Optional if EM_API env is set.
port <value>
    Port ranges to accept, when offer is issued. Optional
docker <value>
    Use Docker to run executor. Allows running multiple instances per host. Optional and defaults to false

traint examples:
ke:slave0    - value equals 'slave0'
like:slave0  - value is not equal to 'slave0'
ke:slave.*   - value starts with 'slave'
ique         - all values are unique
uster        - all values are the same
uster:slave0 - value equals 'slave0'
oupBy        - all values are the same
oupBy:3      - all values are within 3 different groups
Configuring servers in the cluster

NOTE: this section is not final and some configurations may change.

/exhibitor-mesos.sh help config
e: config <id> [options]

 <value> | --api <value>
    Binding host:port for http/artifact server. Optional if EM_API env is set.
stickiness-period <value>
    Stickiness period to preserve same node for Exhibitor server (5m, 10m, 1h).
failover-delay <value>
    Failover delay (10s, 5m, 3h).
failover-max-delay <value>
    Max failover delay. See failoverDelay.
failover-max-tries <value>
    Max failover tries. Default - none
configtype <value>
    Config type to use: s3 or zookeeper. Optional.
configcheckms <value>
    Period (ms) to check for shared config updates. Optional.
defaultconfig <value>
    Full path to a file that contains initial/default values for Exhibitor/ZooKeeper config values. The file is a standard property file. Optional.
headingtext <value>
    Extra text to display in UI header. Optional.
hostname <value>
    Hostname to use for this JVM. Optional.
jquerystyle <value>
    Styling used for the JQuery-based UI. Optional.
loglines <value>
    Max lines of logging to keep in memory for display. Default is 1000. Optional.
nodemodification <value>
    If true, the Explorer UI will allow nodes to be modified (use with caution). Default is true. Optional.
prefspath <value>
    Certain values (such as Control Panel values) are stored in a preferences file. By default, Preferences.userRoot() is used. Optional.
servo <value>
    true/false (default is false). If enabled, ZooKeeper will be queried once a minute for its state via the 'mntr' four letter word (this requires ZooKeeper 3.4.x+). Servo will be used to publish this data via JMX. Optional.
timeout <value>
    Connection timeout (ms) for ZK connections. Default is 30000. Optional.
s3credentials <value>
    Credentials to use for s3backup or s3config. Optional.
s3region <value>
    Region for S3 calls (e.g. "eu-west-1"). Optional.
s3config <value>
    The bucket name and key to store the config (s3credentials may be provided as well). Argument is [bucket name]:[key]. Optional.
s3configprefix <value>
    When using AWS S3 shared config files, the prefix to use for values such as locks. Optional.
zkconfigconnect <value>
    The initial connection string for ZooKeeper shared config storage. E.g: host1:2181,host2:2181... Optional.
zkconfigexhibitorpath <value>
    Used if the ZooKeeper shared config is also running Exhibitor. This is the URI path for the REST call. The default is: /. Optional.
zkconfigexhibitorport <value>
    Used if the ZooKeeper shared config is also running Exhibitor. This is the port that Exhibitor is listening on. IMPORTANT: if this value is not set it implies that Exhibitor is not being used on the ZooKeeper shared config. Optional.
zkconfigpollms <value>
    The period in ms to check for changes in the config ensemble. The default is: 10000. Optional.
zkconfigretry <value>
    The retry values to use in the form sleep-ms:retry-qty. The default is: 1000:3. Optional.
zkconfigzpath <value>
    The base ZPath that Exhibitor should use. E.g: /exhibitor/config. Optional.
filesystembackup <value>
    If true, enables file system backup of ZooKeeper log files. Optional.
s3backup <value>
    If true, enables AWS S3 backup of ZooKeeper log files (s3credentials may be provided as well). Optional.
aclid <value>
    Enable ACL for Exhibitor's internal ZooKeeper connection. This sets the ACL's ID. Optional.
aclperms <value>
    Enable ACL for Exhibitor's internal ZooKeeper connection. This sets the ACL's Permissions - a comma list of possible permissions. If this isn't specified the permission is set to ALL. Values: read, write, create, delete, admin. Optional.
aclscheme <value>
    Enable ACL for Exhibitor's internal ZooKeeper connection. This sets the ACL's Scheme. Optional.
log-index-directory <value>
    The directory where indexed Zookeeper logs should be kept. Optional.
zookeeper-install-directory <value>
    The directory where the Zookeeper server is installed. Optional.
zookeeper-data-directory <value>
    The directory where Zookeeper snapshot data is stored. Optional.
zookeeper-log-directory <value>
    The directory where Zookeeper transaction log data is stored. Optional.
backup-extra <value>
    Backup extra shared config. Optional.
zoo-cfg-extra <value>
    Any additional properties to be added to the zoo.cfg file in form: key1\\=value1&key2\\=value2. Optional.
java-environment <value>
    Script to write as the 'java.env' file which gets executed as a part of Zookeeper start script. Optional.
log4j-properties <value>
    Contents of the log4j.properties file. Optional.
client-port <value>
    The port that clients use to connect to Zookeeper. Defaults to 2181. Optional.
connect-port <value>
    The port that other Zookeeper instances use to connect to Zookeeper. Defaults to 2888. Optional.
election-port <value>
    The port that other Zookeeper instances use for election. Defaults to 3888. Optional.
check-ms <value>
    The number of milliseconds between live-ness checks on Zookeeper server. Defaults to 30000. Optional.
cleanup-period-ms <value>
    The number of milliseconds between Zookeeper log file cleanups. Defaults to 43200000. Optional.
cleanup-max-files <value>
    The max number of Zookeeper log files to keep when cleaning up. Defaults to 3. Optional.
backup-max-store-ms <value>
    Backup max store ms shared config. Optional.
backup-period-ms <value>
    Backup period ms shared config. Optional.
port <value>
    Port ranges to accept, when offer is issued. Optional
Starting servers in the cluster
exhibitor-mesos.sh help start
e: start <id> [options]

 <value> | --api <value>
    Binding host:port for http/artifact server. Optional if EM_API env is set.
Stopping servers in the cluster
exhibitor-mesos.sh help stop
e: stop <id> [options]

 <value> | --api <value>
    Binding host:port for http/artifact server. Optional if EM_API env is set.
Removing servers from the cluster
exhibitor-mesos.sh help remove
e: remove <id> [options]

 <value> | --api <value>
    Binding host:port for http/artifact server. Optional if EM_API env is set.

Open issues here.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.