data-8/stats

Name: stats

Owner: Data Science 8

Description: Stats dashboard for our JupyterHub deployment

Created: 2016-05-06 21:58:59.0

Updated: 2018-02-22 11:06:22.0

Pushed: 2016-05-06 23:45:41.0

Homepage: null

Size: 175

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Systems Logging with StatsD and Graphite

Author: Eric Zhao Collaborators: Steve Yang
Introduction

The purpose of this project was to create a viable metric of JupyterHub system that allowed us to have a more comprehensive view of the state of the system, errors and problems students run into, and the change in performance of the system before and after deployments through graphs generated by collected data and metrics. For this project, we have the following components:

Installation
Manual Installation

Please refer to this post on Slack for detailed instructions.

PDF copy

Docker Installation

Pull from the following Docker container:

TODO: Add link to Docker container

ATTENTION: Please read through and understand the manual installation before using the Docker container. Parts of the installation process require you to set passwords and configure settings. Because we automated this process in the Docker container, it is important for you to go into the appropriate sections of the Docker container to set your desired configurations.

Build the Docker image:

command for build

Run the Dockerfile:

command for run

Usage
Accessing Grafana

To access the Grafana dashboards, open your browser and visit the domain you specified in your configuration files. From there after logging in, you can select make new dashboard and choose the appropriate statistics you wish to explore.

Emitting Statistics

Although there are multiple ways to send UDP packets to statsd, such as Python, Ruby, etc., we attempted to write a Python script. We followed the following documentations (Statsd Metrics Documentation, Statsd Metrics Documentation 2) on our attempt to write the script. Here is the link to the python package.

Notes

Steve and I began by following tutorials to get the whole combination of packages installed and have a working version. We documented our whole process in the note linked above. Afterwards, I worked on dockerizing the installation setup process while Steve worked on having the system emit data to be received by Statsd.

Eric: Dockerizing turned out to be a pretty difficult process. A pretty big roadblock was figuring out how to automate responses when the installation process prompted the user for input. However, the biggest challenge was having our project feature multiple running processes that work together (Graphite, Statsd, PostgreSQL, Grafana, Carbon, Apache) while Docker only really supports one process. Our solution was to use Supervisor which is a process control system that allows its users to monitor and control multiple processes on UNIX-like operating systems. We also had to scrap PostgreSQL and instead use the default database SQLite3 for graphite. However for deployment, a more robust database like PostgreSQL may be preferred. At our current stage, the Docker image builds successfully and runs without shutting down. However, Grafana fails to load so configurations is most likely off at some place.

Steve: We were unable to make a complete script for sending stats to statsd. Partially it was due to time constraint and the available resources needed to write the script. Ideally, we would have used logs coming from the deployment-server, but instead used logs from a log file and sent stats from that file to statsd. Following the statsd-metrics documentation for Python, we set-up a TCPClient with the IP address and port to where statsd was located. Then we attempted to read the logs, one-by-one, and increment whatever statistics we were trying to send to statsd. Unfortunately, we ran into errors with modules/packages not being found for statsd-metrics. In the future, one may need a couple weeks in order to finish this task. One will need a working deployment-server and the correct statsd-metrics packages. Also, if one uses statsd in a docker container, the correct host and port are needed to send the UDP packets to the correct location.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.