Name: datastax-enterprise-mesos
Owner: Elodina
Description: DataStax Enterprise on Mesos
Created: 2015-10-23 12:40:14.0
Updated: 2016-10-21 02:03:37.0
Pushed: 2016-05-25 10:11:33.0
Homepage: http://www.elodina.net
Size: 870
Language: Scala
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
This framework aims to simplify running Datastax Enterprise on Mesos. Being actively developed right now.
Currently this framework supports running Cassandra nodes only.
Open issues here https://github.com/elodina/datastax-enterprise-mesos/issues
Minimum supported Mesos version is 0.23.0
Clone and build the project
# git clone https://github.com/elodina/datastax-enterprise-mesos
# cd datastax-enterprise-mesos
# ./gradlew jar
Get Java 7 JRE
# wget --no-check-certificate --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/7u79-b15/jre-7u79-linux-x64.tar.gz
Get Datastax Enterprise distribution
# wget --user user --password pass http://downloads.datastax.com/enterprise/dse.tar.gz
dse-mesos.sh help scheduler
t scheduler
e: scheduler [options]
on (* = required) Description
----------------- -----------
bug <Boolean> Run in debug mode. (default: false)
amework-name Framework name. Defaults to dse.
amework-role Framework role. Defaults to *.
amework-timeout Framework failover timeout. Defaults
to 30 days.
e Path to JRE archive.
master Mesos Master addresses.
incipal Principal (username) used to register
framework.
cret Secret (password) used to register
framework.
storage Storage for cluster state. Examples:
file:dse-mesos.json; zk:master:
2181/dse-mesos.
er Mesos user. Defaults to current system
user.
ric Options
on Description
-- -----------
i Binding host:port for http/artifact
server. Optional if DM_API env is
set.
In order not to pass the API url to each CLI call lets export the URL as follows:
port DM_API=http://master:7000
First lets start 1 Cassandra node with the default settings. Further you will see how to change node settings.
dse-mesos.sh node add 0
added:
: 0
ate: idle
pology: cluster:default, dc:default, rack:default
sources: cpu:0.5, mem:512
ed: false
ickiness: period:30m
You now have a cluster with 1 Cassandra node that is not started.
dse-mesos.sh node list
:
: 0
ate: idle
pology: cluster:default, dc:default, rack:default
sources: cpu:0.5, mem:512
ed: false
ickiness: period:30m
You can configure node with resource requirement like cpu or memory, also you can specify overrides to cassandra.yaml file.
dse-mesos.sh node update 0 --cpu 0.8 --cassandra-yaml-configs max_hints_delivery_threads=2,hinted_handoff_enabled=false
added:
: 0
ate: idle
pology: cluster:default, dc:default, rack:default
sources: cpu:0.8, mem:512
ed: false
ickiness: period:30m
ssandra.yaml overrides: max_hints_delivery_threads=2,hinted_handoff_enabled=false
Key-value pairs specified after --cassandra-yaml-configs
option will override default cassandra.yaml configuration file that comes with
dse distribution. Note that this way you can override only key-value configs from casandra.yaml while there are also arrays,
objects. Also, supplied key-value pairs are “blindly” passed to the Cassandra process, i.e. configs are not validated.
Now lets start the task. This call to CLI will block until the task is actually started but will wait no more than a configured timeout. Timeout can be passed via –timeout flag and defaults to 2 minutes. If a timeout of 0s is passed CLI won't wait for tasks to start at all and will reply with “Scheduled tasks …” message.
dse-mesos.sh node start 0
started:
: 0
ate: running
pology: cluster:default, dc:default, rack:default
sources: cpu:0.5, mem:512
ed: true
ickiness: period:30m, hostname:slave0
ntime:
task id: node-0-1449579588537
executor id: node-0-1449579588537
slave id: faa6dc48-30d4-4385-8cd4-d0be512e0521-S1
hostname: slave0
seeds: slave0
By now you should have a single Cassandra node instance running. You should be able to connect to it via cqlsh <hostname>
(in our case cqlsh slave0
should work).
Here's how you stop it:
dse-mesos.sh node stop 0
stopped:
: 0
ate: idle
pology: cluster:default, dc:default, rack:default
sources: cpu:0.5, mem:512
ed: true
ickiness: period:30m, hostname:slave0, expires:2015-12-22 16:23:29+02
And remove:
dse-mesos.sh node remove 0
removed
Ensure your OpsCenter version matches DataStax Enterprise version.
In order to integrate OpsCenter with existing Cassandra cluster you need to have a correct address.yaml
configuration file.
This file is automatically created for you by the framework when you start the node.
DSE Agent (which is started with the Cassandra instance) will monitor the local node and by default put all OpsCenter data to the local Cassandra cluster.
The only property which needs to be specified is stomp_interface
- reachable IP address of the opscenterd machine.
You can set this property as part of address-yaml-configs
cli option:
dse-mesos.sh node update 0 --address-yaml-configs "stomp_interface=10.1.131.9"
To run in HA mode (that is tolerate scheduler failures) this framework supports two options for persisting its state:
zookeeper (scheduler --storage zk:...
)
external cassandra (scheduler --storage casandra:...
)
(Note: choosing storage type file
doesn't guarantee you fault tolerance because after failover new scheduler instance may
be started on a different machine where the initial state file will be not available thus new scheduler instance won't be able to
recover its state)
To use cassandra
storage type you need to do some preparations:
have a running C* instance reachable from machines where you plan to run schedulers
create a keyspace e.g. dse_mesos
with a replication factor per your needs
create a table e.g. dse_mesos_framework
in a newly created keyspace, the schema is defined in vagrant/cassandra_schema.cql
Last two items can be achieve by executing (cqlsh needs to be in PATH):
lsh host cql_port -f /vagrant/vagrant/cassandra_schema.cql
After that you can run scheduler with the following command:
./dse-mesos.sh scheduler --master zk://master:2181/mesos --debug true --storage cassandra:9042:192.168.3.5,192.168.3.6 \
-cassandra-keyspace dse_mesos --cassandra-table dse_mesos_framework
While the scheduler has a shutdown hook it doesn't actually finish the framework.
To shutdown the framework completely (e.g. unregister it in Mesos) you may shoot a
POST
to /teardown
specifying the framework id to shutdown:
rl -d frameworkId=20150807-094500-84125888-5050-14187-0005 -X POST http://master:5050/teardown
You already have running nodes.
dse-mesos.sh node update 0..1 --data-file-dirs /sstable/xvdv,/sstable/xvdw,/sstable/xvdx
s updated:
: 0
ate: running
dified: has pending update
pology: cluster:default, dc:default, rack:default
sources: cpu:0.7, mem:1024
ed: true
ta file dirs: /sstable/xvdv,/sstable/xvdw,/sstable/xvdx
mmit log dir: /var/lib/cassandra/commitlog
.
: 1
ate: running
dified: has pending update
pology: cluster:default, dc:default, rack:default
sources: cpu:0.7, mem:1024
ed: false
ta file dirs: /sstable/xvdv,/sstable/xvdw,/sstable/xvdx
mmit log dir: /var/lib/cassandra/commitlog
.
Whenever node has none idle
state (starting
, running
, stopping
, reconciling
) and you update it,
modified
flag will communicate that node has pending update
, to apply update node has to be stopped.
Nodes will be restarted one by one, scheduler stops node then starts it, same applied to rest of the nodes.
dse-mesos.sh node restart 0..1 --timeout 8min
ping node 0 ... done
ting node 0 ... done
ping node 1 ... done
ting node 1 ... done
s restarted:
: 0
ate: running
ta file dirs: /sstable/xvdv,/sstable/xvdw,/sstable/xvdx
mmit log dir: /var/lib/cassandra/commitlog
.
: 1
ate: running
ta file dirs: /sstable/xvdv,/sstable/xvdw,/sstable/xvdx
mmit log dir: /var/lib/cassandra/commitlog
.
Some times node could timeout on start or stop, in such case restart halts with notice:
e-mesos.sh node restart 0..1 --timeout 8min
ping node 0 ... done
ting node 0 ... done
ping node 1 ... done
ting node 1 ... Error: node 1 timeout on start
Given by --mem
option RAM will be consumed by executor, DSE process, DSE agent.
DSE process capped by Xmx, however also consumes offheap
memory. By default
To override default Xmx and/or Xmn calculation specify it via --cassandra-jvm-options
, for instance:
e-mesos.sh node update 0 --cassandra-jvm-options "-Xmx1024M"
in this case Xmn
will be calculated by default (according to given Xmx
in --cassandra-jvm-options
), or override only Xmn
e-mesos.sh node update 0 --cassandra-jvm-options "-Xmn100M"
or both
e-mesos.sh node update 0 --cassandra-jvm-options "-Xmx2048M -Xmn100M"
Xmx
, Xmn
correspond to env variables MAX_HEAP_SIZE
, HEAP_NEWSIZE
and will be set in cassandra-env.sh
When a node fails, DSE mesos scheduler assumes that the failure is recoverable. The scheduler will try to restart the node after waiting failover-delay (i.e. 30s, 2m). The initial waiting delay is equal to failover-delay setting. After each consecutive failure this delay is doubled until it reaches failover-max-delay value.
If failover-max-tries is defined and the consecutive failure count exceeds it, the node will be deactivated.
The following failover settings exists:
ilover-delay - initial failover delay to wait after failure (option value is required)
ilover-max-delay - max failover delay (option value is required)
ilover-max-tries - max failover tries to deactivate broker (to reset to unbound pass --failover-max-tries "")
Scheduler of version 0.2.1.3 is able to migrate state between versions for all types of storages (file, zk, cassandra) from version 0.2.1.2. For example you are running 0.2.1.2 version and want to update to 0.2.1.3 then you have to stop scheduler (for C storage stop all running schedulers that share state table) and then start new scheduler (for C start schedulers sequentially, first one will migrate state table, next will just start without migrating anything).
Just pass option --solr-enabled true
when creating
e-mesos.sh node add 0 --solr-enabled true
added:
: 0
ate: idle
pology: cluster:default, dc:default, rack:default
sources: cpu:0.5, mem:512
ed: false
lr: true
rs: data:<auto>, commit:<auto>, caches:<auto>
ilover: delay:3m, max-delay:30m
ickiness: period:30m
or updating node configuration via CLI (NOTE: solr: true
means node with SOLR),
e-mesos.sh node update 0 --solr-enabled true
in order to disable launching SOLR on node pass --solr-enabled false
(don't forget to restart updated node).
SOLR requires two ports HTTP
and Shard
, configure them via ./dse-mesos cluster <add | update>
with options
--solr-http-port
and --solr-shard-port
.
dse-mesos.sh help
e: <cmd> ...
ands:
lp [cmd [cmd]] - print general or command-specific help
heduler - start scheduler
de - node management commands
uster - cluster management commands
`help <cmd>` to see details of specific command
You may also run ./dse-mesos.sh help <cmd> [<cmd>]
to view help of specific command/sub-command.
dse-mesos.sh help node restart
art node
e: node restart <id> [options]
on Description
-- -----------
meout Time to wait until node restart.
Should be a parsable Scala Duration
value. Defaults to 4m.
ric Options
on Description
-- -----------
i Binding host:port for http/artifact
server. Optional if DM_API env is
set.