Name: Scalable-Cassandra-deployment-on-Kubernetes
Owner: International Business Machines
Description: In this code we provide a full roadmap the deployment of a multi-node scalable Cassandra cluster on Kubernetes. Cassandra understands that it is running within a cluster manager, and uses this cluster management infrastructure to help implement the application. Kubernetes concepts like Replication Controller, StatefulSets etc. are leveraged to deploy either non-persistent or persistent Cassandra clusters on Kubernetes cluster.
Created: 2017-03-13 23:32:23.0
Updated: 2018-05-22 18:52:28.0
Pushed: 2018-05-14 19:48:13.0
Homepage: https://developer.ibm.com/code/patterns/deploy-a-scalable-apache-cassandra-database-on-kubernetes
Size: 2035
Language: Shell
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
Read this in other languages: ??????.
This project demonstrates the deployment of a multi-node scalable Cassandra cluster on Kubernetes. Apache Cassandra is a massively scalable open source NoSQL database. Cassandra is perfect for managing large amounts of structured, semi-structured, and unstructured data across multiple datacenters and the cloud.
Leveraging Kubernetes concepts such as PersistentVolume and StatefulSets, we can provide a resilient installation of Cassandra and be confident that its data (state) are safe.
We also utilize a “headless” service for Cassandra. This way we can provide a way for applications to access it via KubeDNS and not expose it to the outside world. To access it from your developer workstation you can use kubectl exec
commands against any of the cassandra pods. If you do wish to connect an application to it you can use the KubeDNS value of cassandra.default.svc.cluster.local
when configuring your application.
In order to follow this guide you'll need a Kubernetes cluster. If you do not have access to an existing Kubernetes cluster then follow the instructions (in the link) for one of the following:
The code here is regularly tested against Kubernetes Cluster from Bluemix Container Service using Travis CI.
After installing (or setting up your access to) Kubernetes ensure that you can access it by running the following and confirming you get version responses for both the Client and the Server:
bectl version
nt Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.5", GitCommit:"17d7182a7ccbb167074be7a87f0a68bd00d58d97", GitTreeState:"clean", BuildDate:"2017-08-31T09:14:02Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
er Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.5", GitCommit:"17d7182a7ccbb167074be7a87f0a68bd00d58d97", GitTreeState:"clean", BuildDate:"2017-09-18T20:30:29Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
To allow us to do simple discovery of the cassandra seed node (which we will deploy shortly) we can create a “headless” service. We do this by specifying none for the clusterIP in the cassandra-service.yaml. This headless service allows us to use KubeDNS for the Pods to discover the IP address of the Cassandra seed.
You can create the headless service using the cassandra-service.yaml file:
bectl create -f cassandra-service.yaml
ice "cassandra" created
bectl get svc cassandra
CLUSTER-IP EXTERNAL-IP PORT(S) AGE
andra None <none> 9042/TCP 10s
Most applications deployed to Kubernetes should be cloud native and rely on external resources for their data (or state). However since Cassandra is a database we can use Stateful sets and Persistent Volumes to ensure resiliency in our database.
To create persistent Cassandra nodes, we need to provision Persistent Volumes. There are two ways to provision PV's: dynamically and statically.
For the sake of simplicity and compatibility we will use Static provisioning where we will create volumes manually using the provided yaml files.
note: You'll need to have the same number of Persistent Volumes as the number of your Cassandra nodes. If you are expecting to have 3 Cassandra nodes, you'll need to create 3 Persistent Volumes.
The provided local-volumes.yaml file already has 3 Persistent Volumes defined. Update the file to add more if you expect to have greater than 3 Cassandra nodes. Create the volumes:
bectl create -f local-volumes.yaml
bectl get pv
CAPACITY ACCESSMODES RECLAIMPOLICY STATUS CLAIM STORAGECLASS REASON AGE
andra-data-1 1Gi RWO Recycle Available 7s
andra-data-2 1Gi RWO Recycle Available 7s
andra-data-3 1Gi RWO Recycle Available 7s
The StatefulSet is responsible for creating the Pods. It provides ordered deployment, ordered termination and unique network names. Run the following command to start a single Cassandra server:
bectl create -f cassandra-statefulset.yaml
You can check if your StatefulSet has deployed using the command below.
bectl get statefulsets
DESIRED CURRENT AGE
andra 1 1 2h
If you view the list of the Pods, you should see 1 Pod running. Your Pod name should be cassandra-0 and the next pods would follow the ordinal number (cassandra-1, cassandra-2,..) Use this command to view the Pods created by the StatefulSet:
bectl get pods -o wide
READY STATUS RESTARTS AGE IP NODE
andra-0 1/1 Running 0 1m 172.xxx.xxx.xxx 169.xxx.xxx.xxx
To check if the Cassandra node is up, perform a nodetool status:
bectl exec -ti cassandra-0 -- nodetool status
center: DC1
===========
us=Up/Down
tate=Normal/Leaving/Joining/Moving
Address Load Tokens Owns (effective) Host ID Rack
172.xxx.xxx.xxx 109.28 KB 256 100.0% 6402e90d-7995-4ee1-bb9c-36097eb2c9ec Rack1
To increase or decrease the size of your StatefulSet you can use the scale command:
bectl scale --replicas=3 statefulset/cassandra
Wait a minute or two and check if it worked:
bectl get statefulsets
DESIRED CURRENT AGE
andra 3 3 2h
If you watch the Cassandra pods deploy, they should be created sequentially.
You can view the list of the Pods again to confirm that your Pods are up and running.
bectl get pods -o wide
READY STATUS RESTARTS AGE IP NODE
andra-0 1/1 Running 0 13m 172.xxx.xxx.xxx 169.xxx.xxx.xxx
andra-1 1/1 Running 0 38m 172.xxx.xxx.xxx 169.xxx.xxx.xxx
andra-2 1/1 Running 0 38m 172.xxx.xxx.xxx 169.xxx.xxx.xxx
You can perform a nodetool status to check if the other cassandra nodes have joined and formed a Cassandra cluster.
Note: It can take around 5 minutes for the Cassandra database to finish its setup.
bectl exec -ti cassandra-0 -- nodetool status
center: DC1
===========
us=Up/Down
tate=Normal/Leaving/Joining/Moving
Address Load Tokens Owns (effective) Host ID Rack
172.xxx.xxx.xxx 103.25 KiB 256 68.7% 633ae787-3080-40e8-83cc-d31b62f53582 Rack1
172.xxx.xxx.xxx 108.62 KiB 256 63.5% e95fc385-826e-47f5-a46b-f375532607a3 Rack1
172.xxx.xxx.xxx 177.38 KiB 256 67.8% 66bd8253-3c58-4be4-83ad-3e1c3b334dfd Rack1
You will need to wait for the status of the nodes to be Up and Normal (UN) to execute the commands in the next steps.
You can access the cassandra container using the following command:
ctl exec -it cassandra-0 cqlsh
ected to Cassandra at 127.0.0.1:9042.
sh 5.0.1 | Cassandra 3.11.1 | CQL spec 3.4.4 | Native protocol v4]
HELP for help.
h> describe tables
pace system_traces
------------------
ts sessions
pace system_schema
------------------
es triggers views keyspaces dropped_columns
tions aggregates indexes types columns
pace system_auth
----------------
urce_role_permissons_index role_permissions role_members roles
pace system
-----------
lable_ranges peers batchlog transferred_ranges
hes compaction_history size_estimates hints
ared_statements sstable_activity built_views
exInfo" peer_events range_xfers
s_builds_in_progress paxos local
pace system_distributed
-----------------------
ir_history view_build_status parent_repair_history
kubectl logs <your-pod-name>
kubectl delete pvc -l app=cassandra
kubectl delete statefulset cassandra
if you created the Cassandra StatefulSetkubectl delete svc cassandra
kubectl delete statefulset,pvc,pv,svc -l app=cassandra