dcos/dcos-diagnostics

Name: dcos-diagnostics

Owner: DC/OS

Description: DC/OS Distributed Diagnostics Tool & Aggregation Service

Created: 2017-06-14 19:49:08.0

Updated: 2018-05-22 18:07:37.0

Pushed: 2018-05-23 09:32:33.0

Homepage:

Size: 1847

Language: Go

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

dcos-diagnostics License Jenkins Go Report Card

DC/OS Distributed Diagnostics Tool & Aggregation Service

dcos-diagnostics is a monitoring agent which exposes a HTTP API for querying from the /system/health/v1 DC/OS api. dcos-diagnostics puller collects the data from agents and represents individual node health for things like system resources as well as DC/OS-specific services.

Build
et github.com/dcos/dcos-diagnostics
GOPATH/src/github.com/dcos/dcos-diagnostics
 install
os-diagnostics --version
Run

Run dcos-diagnostics once, on a DC/OS host to check systemd units:

-diagnostics --diag

Get verbose log output:

-diagnostics --diag --verbose

Run the dcos-diagnostics aggregation service to query all cluster hosts for health state:

-diagnostics daemon --pull

Start the dcos-diagnostics health API endpoint:

-diagnostics daemon
dcos-diagnostics daemon options
--agent-port int
    Use TCP port to connect to agents. (default 1050)

--ca-cert string
    Use certificate authority.

--command-exec-timeout int
    Set command executing timeout (default 120)

--diag
    Get diagnostics output once on the CLI. Does not expose API.

--diagnostics-bundle-dir string
    Set a path to store diagnostic bundles (default "/var/run/dcos/dcos-diagnostics/diagnostic_bundles")

--diagnostics-job-timeout int
    Set a global diagnostics job timeout (default 720)

--diagnostics-units-since string
    Collect systemd units logs since (default "24 hours ago")

--diagnostics-url-timeout int
    Set a local timeout for every single GET request to a log endpoint (default 2)

--endpoint-config string
    Use endpoints_config.json (default "/opt/mesosphere/endpoints_config.json")

--exhibitor-ip string
    Use Exhibitor IP address to discover master nodes. (default "http://127.0.0.1:8181/exhibitor/v1/cluster/status")

--force-tls
    Use HTTPS to do all requests.

--health-update-interval int
    Set update health interval in seconds. (default 60)

--master-port int
    Use TCP port to connect to masters. (default 1050)

--port int
    Web server TCP port. (default 1050)

--pull
    Try to pull checks from DC/OS hosts.

--pull-interval int
    Set pull interval in seconds. (default 60)

--pull-timeout int
    Set pull timeout. (default 3)

--verbose
    Use verbose debug output.

--version
    Print version.
dcos-diagnostics checks options

TBD

Test
 test

Or from any submodule:

est

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.