rancher/etcd-operator

Name: etcd-operator

Owner: Rancher

Description: etcd operator creates/configures/manages etcd clusters atop Kubernetes

Forked from: coreos/etcd-operator

Created: 2017-05-19 20:57:58.0

Updated: 2017-05-19 20:58:01.0

Pushed: 2017-06-09 18:35:14.0

Homepage: https://coreos.com/blog/introducing-the-etcd-operator.html

Size: 1889

Language: Go

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

etcd operator

unit/integration: Build Status e2e (Kubernetes stable): Build Status e2e (Kubernetes master): Build Status

Project status: beta

Major planned features have been completed and while no breaking API changes are currently planned, we reserve the right to address bugs and API changes in a backwards incompatible way before the project is declared stable. See upgrade guide for safe upgrade process.

Currently user face etcd cluster objects are created as Kubernetes Third Party Resources, however, taking advantage of User Aggregated API Servers to improve reliability, validation and versioning is planned. The use of Aggregated API should be minimally disruptive to existing users but may change what Kubernetes objects are created or how users deploy the etcd operator.

We expect to consider the etcd operator stable soon; backwards incompatible changes will not be made once the project reaches stability.

Overview

The etcd operator manages etcd clusters deployed to Kubernetes and automates tasks related to operating an etcd cluster.

There are more spec examples on setting up clusters with backup, restore, and other configurations.

Read Best Practices for more information on how to better use etcd operator.

Read RBAC docs for how to setup RBAC rules for etcd operator if RBAC is in place.

Read Developer Guide for setting up development environment if you want to contribute.

Requirements
Demo

etcd Operator demo

Deploy etcd operator

See instructions on how to install/uninstall etcd operator .

Create and destroy an etcd cluster
bectl create -f example/example-etcd-cluster.yaml

A 3 member etcd cluster will be created.

bectl get pods
                            READY     STATUS    RESTARTS   AGE
ple-etcd-cluster-0000       1/1       Running   0          1m
ple-etcd-cluster-0001       1/1       Running   0          1m
ple-etcd-cluster-0002       1/1       Running   0          1m

See client service for how to access etcd clusters created by operator.

If you are working with minikube locally create a nodePort service and test out that etcd is responding:

bectl create -f example/example-etcd-cluster-nodeport-service.json
port ETCDCTL_API=3
port ETCDCTL_ENDPOINTS=$(minikube service example-etcd-cluster-client --url)
cdctl put foo bar

Destroy etcd cluster:

bectl delete -f example/example-etcd-cluster.yaml
Resize an etcd cluster

Note: In order to use kubectl apply you will need Kubernetes 1.6 or higher. Otherwise you will have to post the changes to the API server directly using curl.

Create an etcd cluster:

bectl apply -f example/example-etcd-cluster.yaml

In example/example-etcd-cluster.yaml the initial cluster size is 3. Modify the file and change size from 3 to 5.

t example/example-etcd-cluster.yaml
ersion: "etcd.coreos.com/v1beta1"
: "Cluster"
data:
me: "example-etcd-cluster"
:
ze: 5
rsion: "3.1.8"

Apply the size change to the cluster TPR:

bectl apply -f example/example-etcd-cluster.yaml

The etcd cluster will scale to 5 members (5 pods):

bectl get pods
                            READY     STATUS    RESTARTS   AGE
ple-etcd-cluster-0000       1/1       Running   0          1m
ple-etcd-cluster-0001       1/1       Running   0          1m
ple-etcd-cluster-0002       1/1       Running   0          1m
ple-etcd-cluster-0003       1/1       Running   0          1m
ple-etcd-cluster-0004       1/1       Running   0          1m

Similarly we can decrease the size of cluster from 5 back to 3 by changing the size field again and reapplying the change.

t example/example-etcd-cluster.yaml
ersion: "etcd.coreos.com/v1beta1"
: "Cluster"
data:
me: "example-etcd-cluster"
:
ze: 3
rsion: "3.1.8"
bectl apply -f example/example-etcd-cluster.yaml

We should see that etcd cluster will eventually reduce to 3 pods:

bectl get pods
                            READY     STATUS    RESTARTS   AGE
ple-etcd-cluster-0002       1/1       Running   0          1m
ple-etcd-cluster-0003       1/1       Running   0          1m
ple-etcd-cluster-0004       1/1       Running   0          1m
Member recovery

If the minority of etcd members crash, the etcd operator will automatically recover the failure. Let's walk through in the following steps.

Create an etcd cluster:

bectl create -f example/example-etcd-cluster.yaml

Wait until all three members are up. Simulate a member failure by deleting a pod:

bectl delete pod example-etcd-cluster-0000 --now

The etcd operator will recover the failure by creating a new pod example-etcd-cluster-0003:

bectl get pods
                            READY     STATUS    RESTARTS   AGE
ple-etcd-cluster-0001       1/1       Running   0          1m
ple-etcd-cluster-0002       1/1       Running   0          1m
ple-etcd-cluster-0003       1/1       Running   0          1m

Destroy etcd cluster:

bectl delete -f example/example-etcd-cluster.yaml
etcd operator recovery

If the etcd operator restarts, it can recover its previous state. Let's walk through in the following steps.

bectl create -f example/example-etcd-cluster.yaml

Wait until all three members are up. Then

bectl delete -f example/deployment.yaml
oyment "etcd-operator" deleted

bectl delete pod example-etcd-cluster-0000 --now
"example-etcd-cluster-0000" deleted

Then restart the etcd operator. It should recover itself and the etcd clusters it manages.

bectl create -f example/deployment.yaml
oyment "etcd-operator" created

bectl get pods
                            READY     STATUS    RESTARTS   AGE
ple-etcd-cluster-0001       1/1       Running   0          1m
ple-etcd-cluster-0002       1/1       Running   0          1m
ple-etcd-cluster-0003       1/1       Running   0          1m
Disaster recovery

If the majority of etcd members crash, but at least one backup exists for the cluster, the etcd operator can restore the entire cluster from the backup.

By default, the etcd operator creates a storage class on initialization:

bectl get storageclass
                 TYPE
-backup-gce-pd   kubernetes.io/gce-pd

This is used to request the persistent volume to store the backup data. See other backup options.

To enable backup, create an etcd cluster with backup enabled spec.

bectl create -f example/example-etcd-cluster-with-backup.yaml

A persistent volume claim is created for the backup pod:

bectl get pvc
                                   STATUS    VOLUME                                     CAPACITY   ACCESSMODES   AGE
ple-etcd-cluster-with-backup-pvc   Bound     pvc-79e39bab-b973-11e6-8ae4-42010af00002   1Gi        RWO           9s

Let's try to write some data into etcd:

bectl run --rm -i --tty fun --image quay.io/coreos/etcd --restart=Never -- /bin/sh
ETCDCTL_API=3 etcdctl --endpoints http://example-etcd-cluster-with-backup-client:2379 put foo bar

l-D to exit)

Now let's kill two pods to simulate a disaster failure:

bectl delete pod example-etcd-cluster-with-backup-0000 example-etcd-cluster-with-backup-0001 --now
"example-etcd-cluster-with-backup-0000" deleted
"example-etcd-cluster-with-backup-0001" deleted

Now quorum is lost. The etcd operator will start to recover the cluster by:

bectl get pods
                                                    READY     STATUS     RESTARTS   AGE
ple-etcd-cluster-with-backup-0003                   0/1       Init:0/2   0          11s
ple-etcd-cluster-with-backup-backup-sidecar-e9gkv   1/1       Running    0          18m

bectl get pods
                                                    READY     STATUS    RESTARTS   AGE
ple-etcd-cluster-with-backup-0003                   1/1       Running   0          3m
ple-etcd-cluster-with-backup-0004                   1/1       Running   0          3m
ple-etcd-cluster-with-backup-0005                   1/1       Running   0          3m
ple-etcd-cluster-with-backup-backup-sidecar-e9gkv   1/1       Running   0          22m

Finally, besides destroying the cluster, also cleanup the backup if you don't need it anymore:

bectl delete pvc example-etcd-cluster-with-backup-pvc

Note: There could be a race that it will fall to single member recovery if a pod is recovered before another is deleted.

Upgrade an etcd cluster

Have the following yaml file ready:

t 3.0-etcd-cluster.yaml
ersion: "etcd.coreos.com/v1beta1"
: "Cluster"
data:
me: "example-etcd-cluster"
:
ze: 3
rsion: "3.0.16"

Create an etcd cluster with the version specified (3.0.16) in the yaml file:

bectl apply -f 3.0-etcd-cluster.yaml
bectl get pods
                           READY     STATUS    RESTARTS   AGE
ple-etcd-cluster-0000      1/1       Running   0          37s
ple-etcd-cluster-0001      1/1       Running   0          25s
ple-etcd-cluster-0002      1/1       Running   0          14s

The container image version should be 3.0.16:

bectl get pod example-etcd-cluster-0000 -o yaml | grep "image:" | uniq
image: quay.io/coreos/etcd:v3.0.16

Now modify the file 3.0-etcd-cluster.yaml and change the version from 3.0.16 to 3.1.8:

t 3.0-etcd-cluster.yaml
ersion: "etcd.coreos.com/v1beta1"
: "Cluster"
data:
me: "example-etcd-cluster"
:
ze: 3
rsion: "3.1.8"

Apply the version change to the cluster TPR:

bectl apply -f 3.0-etcd-cluster.yaml

Wait ~30 seconds. The container image version should be updated to v3.1.8:

bectl get pod example-etcd-cluster-0000 -o yaml | grep "image:" | uniq
image: quay.io/coreos/etcd:v3.1.8

Check the other two pods and you should see the same result.

Limitations

The operator collects anonymous usage statistics to help us learn how the software is being used and how we can improve it. To disable collection, run the operator with the flag -analytics=false.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.