Name: swarmkit
Owner: Docker
Description: A toolkit for orchestrating distributed systems at any scale. It includes primitives for node discovery, raft-based consensus, task scheduling and more.
Created: 2016-02-12 00:02:15.0
Updated: 2018-05-24 01:09:46.0
Pushed: 2018-05-24 10:09:18.0
Size: 19720
Language: Go
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
SwarmKit is a toolkit for orchestrating distributed systems at any scale. It includes primitives for node discovery, raft-based consensus, task scheduling and more.
Its main benefits are:
Machines running SwarmKit can be grouped together in order to form a Swarm, coordinating tasks with each other. Once a machine joins, it becomes a Swarm Node. Nodes can either be worker nodes or manager nodes.
An operator can dynamically update a Node's role by promoting a Worker to Manager or demoting a Manager to Worker.
Tasks are organized in Services. A service is a higher level abstraction that allows the user to declare the desired state of a group of tasks. Services define what type of task should be created as well as how to execute them (e.g. run this many replicas at all times) and how to update them (e.g. rolling updates).
Some of SwarmKit's main features are:
Orchestration
Desired State Reconciliation: SwarmKit constantly compares the desired state against the current cluster state and reconciles the two if necessary. For instance, if a node fails, SwarmKit reschedules its tasks onto a different node.
Service Types: There are different types of services. The project currently ships with two of them out of the box
Configurable Updates: At any time, you can change the value of one or more fields for a service. After you make the update, SwarmKit reconciles the desired state by ensuring all tasks are using the desired settings. By default, it performs a lockstep update - that is, update all tasks at the same time. This can be configured through different knobs:
Restart Policies: The orchestration layer monitors tasks and reacts to failures based on the specified policy. The operator can define restart conditions, delays and limits (maximum number of attempts in a given time window). SwarmKit can decide to restart a task on a different machine. This means that faulty nodes will gradually be drained of their tasks.
Scheduling
Resource Awareness: SwarmKit is aware of resources available on nodes and will place tasks accordingly.
Constraints: Operators can limit the set of nodes where a task can be scheduled by defining constraint expressions. Multiple constraints find nodes that satisfy every expression, i.e., an AND
match. Constraints can match node attributes in the following table. Note that engine.labels
are collected from Docker Engine with information like operating system, drivers, etc. node.labels
are added by cluster administrators for operational purpose. For example, some nodes have security compliant labels to run tasks with compliant requirements.
| node attribute | matches | example |
|:————- |:————-| :————-|
| node.id | node's ID | node.id == 2ivku8v2gvtg4
|
| node.hostname | node's hostname | node.hostname != node-2
|
| node.ip | node's IP address | node.ip != 172.19.17.0/24
|
| node.role | node's manager or worker role | node.role == manager
|
| node.platform.os | node's operating system | node.platform.os == linux
|
| node.platform.arch | node's architecture | node.platform.arch == x86_64
|
| node.labels | node's labels added by cluster admins | node.labels.security == high
|
| engine.labels | Docker Engine's labels | engine.labels.operatingsystem == ubuntu 14.04
|
Strategies: The project currently ships with a spread strategy which will attempt to schedule tasks on the least loaded nodes, provided they meet the constraints and resource requirements.
Cluster Management
Security
Requirements:
make generate
)SwarmKit is built in Go and leverages a standard project structure to work well with Go tooling. If you are new to Go, please see BUILDING.md for a more detailed guide.
Once you have SwarmKit checked out in your $GOPATH
, the Makefile
can be used for common tasks.
From the project root directory, run the following to build swarmd
and swarmctl
:
ke binaries
Before running tests for the first time, setup the tooling:
ke setup
Then run:
ke all
These instructions assume that swarmd
and swarmctl
are in your PATH.
(Before starting, make sure /tmp/node-N
don't exist)
Initialize the first node:
armd -d /tmp/node-1 --listen-control-api /tmp/node-1/swarm.sock --hostname node-1
Before joining cluster, the token should be fetched:
port SWARM_SOCKET=/tmp/node-1/swarm.sock
armctl cluster inspect default
: 87d2ecpg12dfonxp3g562fru1
: default
estration settings:
sk history entries: 5
atcher settings:
spatcher heartbeat period: 5s
ificate Authority settings:
rtificate Validity Duration: 2160h0m0s
in Tokens:
Worker: SWMTKN-1-3vi7ajem0jed8guusgvyl98nfg18ibg4pclify6wzac6ucrhg3-0117z3s2ytr6egmmnlr6gd37n
Manager: SWMTKN-1-3vi7ajem0jed8guusgvyl98nfg18ibg4pclify6wzac6ucrhg3-d1ohk84br3ph0njyexw0wdagx
In two additional terminals, join two nodes. From the example below, replace 127.0.0.1:4242
with the address of the first node, and use the <Worker Token>
acquired above.
In this example, the <Worker Token>
is SWMTKN-1-3vi7ajem0jed8guusgvyl98nfg18ibg4pclify6wzac6ucrhg3-0117z3s2ytr6egmmnlr6gd37n
.
If the joining nodes run on the same host as node-1
, select a different remote
listening port, e.g., --listen-remote-api 127.0.0.1:4343
.
armd -d /tmp/node-2 --hostname node-2 --join-addr 127.0.0.1:4242 --join-token <Worker Token>
armd -d /tmp/node-3 --hostname node-3 --join-addr 127.0.0.1:4242 --join-token <Worker Token>
If joining as a manager, also specify the listen-control-api.
armd -d /tmp/node-4 --hostname node-4 --join-addr 127.0.0.1:4242 --join-token <Manager Token> --listen-control-api /tmp/node-4/swarm.sock --listen-remote-api 127.0.0.1:4245
In a fourth terminal, use swarmctl
to explore and control the cluster. Before
running swarmctl
, set the SWARM_SOCKET
environment variable to the path of the
manager socket that was specified in --listen-control-api
when starting the
manager.
To list nodes:
port SWARM_SOCKET=/tmp/node-1/swarm.sock
armctl node ls
Name Membership Status Availability Manager Status
---- ---------- ------ ------------ --------------
fpoi36eujbdkgdnbvbi6r node-2 ACCEPTED READY ACTIVE
3tyipofoa2iwqgabsdcve node-1 ACCEPTED READY ACTIVE REACHABLE *
k1uqxhnyyujq66ho0h54t node-3 ACCEPTED READY ACTIVE
wfawdasdewfq66ho34eaw node-4 ACCEPTED READY ACTIVE REACHABLE
Start a redis service:
armctl service create --name redis --image redis:3.0.5
g7vc7cbf9k57qs722n2le
List the running services:
armctl service ls
Name Image Replicas
---- ----- --------
g7vc7cbf9k57qs722n2le redis redis:3.0.5 1/1
Inspect the service:
armctl service inspect redis
: 08ecg7vc7cbf9k57qs722n2le
: redis
icas : 1/1
late
tainer
age : redis:3.0.5
ID Service Slot Image Desired State Last State Node
--- ------- ---- ----- ------------- ---------- ----
ir8wr85lbs8sqg0ug03vr redis 1 redis:3.0.5 RUNNING RUNNING 1 minutes ago node-1
You can update any attribute of a service.
For example, you can scale the service by changing the instance count:
armctl service update redis --replicas 6
g7vc7cbf9k57qs722n2le
armctl service inspect redis
: 08ecg7vc7cbf9k57qs722n2le
: redis
icas : 6/6
late
tainer
age : redis:3.0.5
ID Service Slot Image Desired State Last State Node
--- ------- ---- ----- ------------- ---------- ----
ir8wr85lbs8sqg0ug03vr redis 1 redis:3.0.5 RUNNING RUNNING 3 minutes ago node-1
8y9fevrnh77til1d09vqq redis 2 redis:3.0.5 RUNNING RUNNING 28 seconds ago node-3
c8z93c884anjgpkiatnx6 redis 3 redis:3.0.5 RUNNING RUNNING 28 seconds ago node-2
3wnf9dex3mk6jfqp4tdjw redis 4 redis:3.0.5 RUNNING RUNNING 28 seconds ago node-2
fnooz63met6yfrsk6myvg redis 5 redis:3.0.5 RUNNING RUNNING 28 seconds ago node-1
awtoyk19wqhmtuiq7z9pt redis 6 redis:3.0.5 RUNNING RUNNING 28 seconds ago node-3
Changing replicas from 1 to 6 forced SwarmKit to create 5 additional Tasks in order to comply with the desired state.
Every other field can be changed as well, such as image, args, env, …
Let's change the image from redis:3.0.5 to redis:3.0.6 (e.g. upgrade):
armctl service update redis --image redis:3.0.6
g7vc7cbf9k57qs722n2le
armctl service inspect redis
: 08ecg7vc7cbf9k57qs722n2le
: redis
icas : 6/6
te Status
te : COMPLETED
rted : 3 minutes ago
pleted : 1 minute ago
sage : update completed
late
tainer
age : redis:3.0.6
ID Service Slot Image Desired State Last State Node
--- ------- ---- ----- ------------- ---------- ----
jss61lmwz52pke5hd107g redis 1 redis:3.0.6 RUNNING RUNNING 1 minute ago node-3
94v840thk10tamfqlwztb redis 2 redis:3.0.6 RUNNING RUNNING 1 minute ago node-1
j66xqpoj3cn3zjkdrwff7 redis 3 redis:3.0.6 RUNNING RUNNING 1 minute ago node-3
ipzvxucs3776e4z8gemey redis 4 redis:3.0.6 RUNNING RUNNING 1 minute ago node-2
2lbqzk9fh4kstwpulygvu redis 5 redis:3.0.6 RUNNING RUNNING 1 minute ago node-2
oy82deq7hu3q9cnucfin6 redis 6 redis:3.0.6 RUNNING RUNNING 1 minute ago node-1
By default, all tasks are updated at the same time.
This behavior can be changed by defining update options.
For instance, in order to update tasks 2 at a time and wait at least 10 seconds between updates:
armctl service update redis --image redis:3.0.7 --update-parallelism 2 --update-delay 10s
tch -n1 "swarmctl service inspect redis" # watch the update
This will update 2 tasks, wait for them to become RUNNING, then wait an additional 10 seconds before moving to other tasks.
Update options can be set at service creation and updated later on. If an update command doesn't specify update options, the last set of options will be used.
SwarmKit monitors node health. In the case of node failures, it re-schedules tasks to other nodes.
An operator can manually define the Availability of a node and can Pause and Drain nodes.
Let's put node-1
into maintenance mode:
armctl node drain node-1
armctl node ls
Name Membership Status Availability Manager Status
---- ---------- ------ ------------ --------------
fpoi36eujbdkgdnbvbi6r node-2 ACCEPTED READY ACTIVE
3tyipofoa2iwqgabsdcve node-1 ACCEPTED READY DRAIN REACHABLE *
k1uqxhnyyujq66ho0h54t node-3 ACCEPTED READY ACTIVE
armctl service inspect redis
: 08ecg7vc7cbf9k57qs722n2le
: redis
icas : 6/6
te Status
te : COMPLETED
rted : 2 minutes ago
pleted : 1 minute ago
sage : update completed
late
tainer
age : redis:3.0.7
ID Service Slot Image Desired State Last State Node
--- ------- ---- ----- ------------- ---------- ----
fy8dqbwmlvw5iya802tj0 redis 1 redis:3.0.7 RUNNING RUNNING 23 seconds ago node-2
gvidypcr7q1k3lfgohb42 redis 2 redis:3.0.7 RUNNING RUNNING 2 minutes ago node-3
l0chk3gtwm1100t5yeged redis 3 redis:3.0.7 RUNNING RUNNING 23 seconds ago node-3
fxbg0igypstwliyameobs redis 4 redis:3.0.7 RUNNING RUNNING 2 minutes ago node-3
dxnjz3c8iujdewzaplgr6 redis 5 redis:3.0.7 RUNNING RUNNING 23 seconds ago node-2
ciqhs4239quraw7evttyf redis 6 redis:3.0.7 RUNNING RUNNING 2 minutes ago node-2
As you can see, every Task running on node-1
was rebalanced to either node-2
or node-3
by the reconciliation loop.