honeycombio/honeykafka

Name: honeykafka

Owner: Honeycomb

Description: kafka consumer to feed into honeycomb

Created: 2017-12-05 02:58:30.0

Updated: 2018-04-27 21:13:50.0

Pushed: 2018-04-27 21:25:54.0

Homepage: null

Size: 15107

Language: Go

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

honeykaf is a kafka consumer that expects JSON messages on a kafka topic and sends them on to Honeycomb

How to run honeykaf

honeykaf expects to consume from a single partition on a kafka topic. You should launch one copy of honeykaf per partition for your topic.

The flags given to honeykaf will be used when the message from kafka does not specify a field - the kafka message overrides the flags given to the binary. This is the case for:

The intention is that the producer of the message (the application that puts the event on to the kafka queue) gets to choose the target dataset, sample rate, and dynamic sampler keys to use for that event. Different teams can then share the same kafka infrastructure while sending data to their own dataset.

If an event is pushed into the kafka queue without one of these fields, the flags given to honeykaf will be used as default values. The operator can thereby set a default writekey or sample rate so that individual publishers can opt out of needing those fields.

Kafka Message Format

honeykaf expects to read from a kafka topic populated by events with two keys, meta and data. Meta contains information about how honeykaf should treat the event and data contains the event itself. All keys in meta are optional; flags to honeykaf will provide defaults should fields be missing from the meta object.

Here is an example of the schema:


"meta": {
    "writekey":"abcabc123123",
    "dataset":"myds",
    "presamplerate": 1,
    "goal_samplerate": 1,
    "dynsample_keys": ["key1","key2"],
    "timestamp":"2017-12-04T01:02:03.456Z"
},
"data":{
    "key1":"val1",
    "key2":"val2",
    "key3":"val3"
}

Defaults:

Unless provided as flags to honeykaf, the following defaults apply:

Definitions:

Internally, each unique combination of goal_samplerate, dynsample_keys, dataset and writekey will have its own dynsampler instance for maintaining state. In order for honeykaf to efficiently calcualte its dynamic sampling, try to limit the number of unique combinations of those arguments. As the number of dynsampler instances grows, honeykaf's memory utilization will also grow. Watching that metric will let you gauge whether your flow is sufficiently constrained.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.