Name: khealth
Owner: CoreOS
Description: basic kubernetes health monitoring
Created: 2015-11-21 01:03:56.0
Updated: 2017-12-04 17:01:49.0
Pushed: 2018-01-04 20:30:09.0
Size: 1509
Language: Go
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
khealth is a Kubernetes cluster monitoring suite. Its Routines exercise Kubernetes subsystems and send events to Collectors. Collectors collate these events to compute current cluster state. Cluster status is available from Collectors over a simple HTTP API, which is served on a cluster nodeport in the example below.
If you have a kubernetes cluster, you can deploy khealth.
health/
ctl create -f ./contrib/k8s/khealth-ns.yaml
ctl --namespace=khealth create -f ./contrib/k8s/khealth-rc.yaml
ctl --namespace=khealth create -f ./contrib/k8s/khealth-service.yaml
This will create a nodeport service which exposes the following status endpoints.
| Command | NodePort | | ————- |:————-:| | rcscheduler | 31337 |
A khealth Module is a single command that invokes a set of Routines and a single Collector. The Collector gathers events from the Routines and exposes metrics on its status endpoint.
cmd/
This is where the Module
entrypoint programs live. Each Module
should have exactly one main
package in an eponymous directory beneath cmd/
.
pkg/routines/
Routines are defined in structures that implement the RoutineHandler
interface.
RoutineHandler interface {
it() error
ll() error
eanup() error
Init
is called, and then Poll
in a loop. When the TTL expires, Poll
terminates, and Cleanup
is called. Each iteration of this cycle generates events, which are sent on the Routine's Events
channel, usually to a Collector.
The NewRoutine
function returns a pointer to a khealth Routine
struct. It takes the following arguments:
client
: the Kubernetes API clientpollInterval
: how often (in seconds) Poll
is calledpodTTL
: how many seconds to loop on Poll
before calling Cleanup
handler
: the RoutineHandler
for this routinepkg/collectors/
Collector interface {
art() error
atus(w http.ResponseWriter, r *http.Request)
rminate() error
Collectors must implement the Collector interface and make use of Routines. To wire Routines to a Collector implementation, follow this general pattern:
Start
: Call Start
on all routines this collector uses. Then begin reading events from each routines' Events
channel and collating current state.Status
: Serialize current state to HTTP response.Terminate
: Call SignalTerminate
on each routine. SignalTerminate
is non-blocking, so before returning you'll want to block until each Routine's Events
channel has emitted a nil
value. That way, when Terminate
returns you can be assured your Routines have all cleaned up.cmd/rcscheduler/
This module uses a single routine which schedules/unschedules pause pods via a replication controller. The program exposes a single health endpoint which reports the state of the latest event.
More routines: We want routines that do everything! Test network latency. Write to disk. Compute fibonacci sequences.
Prometheus integration: Collectors expose Prometheus-compatible status endpoints and metrics, providing readymade infrastructure to aggregate statistics from a set of canary pods, designed specifically to exercise Kubernetes cluster resources.
Alerting: Use the experimental alertmanager to alert on metrics
Cluster administrators: Gain insight into your Kubernetes cluster's performance. Monitor health endpoints which report on various testing routines.
Kubernetes developers: A convenient way to “smoke test” a cluster. Feel free to write Modules that exist solely to torture test a cluster and have no business running on the same cluster as production assets. And turn the replica count way up!