coreos/khealth

Name: khealth

Owner: CoreOS

Description: basic kubernetes health monitoring

Created: 2015-11-21 01:03:56.0

Updated: 2017-12-04 17:01:49.0

Pushed: 2018-01-04 20:30:09.0

Homepage:

Size: 1509

Language: Go

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

khealth

Docker Image on Quay.io

khealth is a Kubernetes cluster monitoring suite. Its Routines exercise Kubernetes subsystems and send events to Collectors. Collectors collate these events to compute current cluster state. Cluster status is available from Collectors over a simple HTTP API, which is served on a cluster nodeport in the example below.

Quick start

If you have a kubernetes cluster, you can deploy khealth.

health/
ctl create -f ./contrib/k8s/khealth-ns.yaml
ctl --namespace=khealth create -f ./contrib/k8s/khealth-rc.yaml
ctl --namespace=khealth create -f ./contrib/k8s/khealth-service.yaml

This will create a nodeport service which exposes the following status endpoints.

| Command | NodePort | | ————- |:————-:| | rcscheduler | 31337 |

Architecture

A khealth Module is a single command that invokes a set of Routines and a single Collector. The Collector gathers events from the Routines and exposes metrics on its status endpoint.

Directory Layout
cmd/

This is where the Module entrypoint programs live. Each Module should have exactly one main package in an eponymous directory beneath cmd/.

pkg/routines/

Routines are defined in structures that implement the RoutineHandler interface.

 RoutineHandler interface {
it() error
ll() error
eanup() error

Init is called, and then Poll in a loop. When the TTL expires, Poll terminates, and Cleanup is called. Each iteration of this cycle generates events, which are sent on the Routine's Events channel, usually to a Collector.

The NewRoutine function returns a pointer to a khealth Routine struct. It takes the following arguments:

pkg/collectors/
 Collector interface {
art() error
atus(w http.ResponseWriter, r *http.Request)
rminate() error

Collectors must implement the Collector interface and make use of Routines. To wire Routines to a Collector implementation, follow this general pattern:

Included Modules
cmd/rcscheduler/

This module uses a single routine which schedules/unschedules pause pods via a replication controller. The program exposes a single health endpoint which reports the state of the latest event.

Roadmap
Who should use this?

Cluster administrators: Gain insight into your Kubernetes cluster's performance. Monitor health endpoints which report on various testing routines.

Kubernetes developers: A convenient way to “smoke test” a cluster. Feel free to write Modules that exist solely to torture test a cluster and have no business running on the same cluster as production assets. And turn the replica count way up!


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.