twitter/zktraffic

Name: zktraffic

Owner: Twitter, Inc.

Description: ZooKeeper protocol analyzer and stats gathering daemon

Created: 2014-07-18 17:39:42.0

Updated: 2018-01-10 15:07:41.0

Pushed: 2017-06-07 03:17:35.0

Homepage: null

Size: 262

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

ZKTraffic Build Status Coverage Status PyPI version

Table of Contents

tl;dr

ZooKeeper protocol analyzer and stats gathering daemon

Installing

You can install ZKTraffic via pip:

p install zktraffic

Or run it from source (if you have the dependencies installed, see below):

t clone https://github.com/twitter/zktraffic.git
 zktraffic
do ZKTRAFFIC_SOURCE=1 bin/zk-dump --iface=eth0

To get a quick count of requests by path:

do ZKTRAFFIC_SOURCE=1 bin/zk-dump --iface=eth0 --count-requests 10000 --sort-by path
49
vices/prod/search 846
figs/teleportation/features 843

Or by type:

do ZKTRAFFIC_SOURCE=1 bin/zk-dump --iface=eth0 --count-requests 10000 --sort-by type
hildrenRequest 9044
tsRequest 958

You can also measure latencies by path (avg, p95 and p99):

do ZKTRAFFIC_SOURCE=1 bin/zk-dump --measure-latency 1000 --group-by path --aggregation-depth 2 --sort-by p99
                     avg         p95         p99
-----------  -----------  ----------  ----------
ty/services  0.000199077  0.00048846  0.00267805
ty           0.000349498  0.00136839  0.00201204
ty/configs   0.000157728  0.00036664  0.00122663

Or by type:

do ZKTRAFFIC_SOURCE=1 bin/zk-dump --measure-latency 1000 --group-by type --sort-by p99
                            avg          p95          p99
------------------  -----------  -----------  -----------
teEphemeralRequest  0.000735009  0.000978041  0.0032404
hildrenRequest      0.000182547  0.000453258  0.00220628
tsRequest           0.000162728  0.000430155  0.000862937

Or by client:

do ZKTRAFFIC_SOURCE=1 bin/zk-dump --measure-latency 1000 --group-by client --sort-by p99
nt                          avg          p95          p99
------------------  -----------  -----------  -----------
.1.3:44308          0.000735009  0.000978041  0.0032404
.1.6:34305          0.000182547  0.000453258  0.00220628
.1.9:36110          0.000162728  0.000430155  0.000862937

Or use the stats gathering daemon:

do ZKTRAFFIC_SOURCE=1 bin/zk-stats-daemon --iface=eth0 --http-port=9090

Or you can build PEX files ? from the source ? for any of the available tools:

p install pex

-dump
x -v -e zktraffic.cli.zk -o zk-dump.pex .

-stats-daemon
x -v -e zktraffic.cli.stats_daemon -o stats-daemon.pex .

b-dump
x -v -e zktraffic.cli.zab -o zab-dump.pex .

e-dump
x -v -e zktraffic.cli.fle -o fle-dump.pex .

More info about PEX here.

What is ZKTraffic?

An {iptraf,top}-esque traffic monitor for ZooKeeper. Right now it exports per-path (and global) stats. Eventually it'll be made to export per-user stats too.

It has a front-end, zk-dump, that can be used in interactive mode to dump traffic:

ed root or CAP_NET_ADMIN & CAP_NET_RAW
do zk-dump --iface eth0
8:05:991542 ConnectRequest(ver=0, zxid=0, timeout=10000, session=0x0, readonly=False, client=127.0.0.1:50049)
?????21:08:06:013513 ConnectReply(ver=0, timeout=10000, session=0x148cf0aedc60000, readonly=False, client=127.0.0.1:50049)
8:07:432361 ExistsRequest(xid=1, path=/, watch=False, size=14, client=127.0.0.1:50049)
?????21:08:07:447353 ExistsReply(xid=1, zxid=31, error=0, client=127.0.0.1:50049)
8:07:448033 GetChildrenRequest(xid=2, path=/, watch=False, size=14, client=127.0.0.1:50049)
?????21:08:07:456169 GetChildrenReply(xid=2, zxid=31, error=0, count=1, client=127.0.0.1:50049)

Or, it can work in daemon mode from which it exposes HTTP/JSON endpoints with stats that can be fed into your favourite data collection system:

do zk-stats-daemon.pex --app_daemonize --aggregation-depth=5

it for 1 min and:

eep 60 && curl http://localhost:7070/json/paths | python -mjson.tool

"ConnectRequest": 2,
"ConnectRequestBytes": 90,
"CreateRequest/configs": 2,
"CreateRequest/configs/server": 2,
"CreateRequest/discovery": 2,
"CreateRequest/discovery/hosts": 2,
"CreateRequest/discovery/services": 2,
"CreateRequestBytes/configs": 110,
"CreateRequestBytes/configs/server": 124,
"CreateRequestBytes/discovery": 114,
"CreateRequestBytes/discovery/hosts": 126,
"CreateRequestBytes/discovery/services": 132,
"ExistsRequest/": 1574,
"ExistsRequest/configs": 3,
"ExistsRequest/configs/server": 2,
"ExistsRequest/discovery": 4,
"ExistsRequest/discovery/hosts": 2,
"ExistsRequest/discovery/services": 2,
"ExistsRequestBytes/": 22036,
"ExistsRequestBytes/configs": 63,
"ExistsRequestBytes/configs/server": 56,
"ExistsRequestBytes/discovery": 92,
"ExistsRequestBytes/discovery/hosts": 58,
"ExistsRequestBytes/discovery/services": 64,
"GetChildrenRequest/configs": 1285,
"GetChildrenRequest/configs/server": 1242,
"GetChildrenRequest/discovery": 1223,
"GetChildrenRequest/discovery/hosts": 1250,
"GetChildrenRequest/discovery/services": 1222,
"GetChildrenRequest/zookeeper/config": 1285,
"GetChildrenRequest/zookeeper/quota/limits": 1228,
"GetChildrenRequest/zookeeper/quota/limits/by-path": 1269,
"GetChildrenRequest/zookeeper/quota/limits/global": 1230,
"GetChildrenRequest/zookeeper/quota/stats/by-path": 1222,
"GetChildrenRequestBytes/discovery/hosts": 36250,
"GetChildrenRequestBytes/discovery/services": 39104,
"GetChildrenRequestBytes/zookeeper/config": 38550,
"GetChildrenRequestBytes/zookeeper/quota/limits": 44208,
"GetChildrenRequestBytes/zookeeper/quota/limits/by-path": 55836,
"GetChildrenRequestBytes/zookeeper/quota/limits/global": 52890,
"GetChildrenRequestBytes/zookeeper/quota/limits/slices": 51815,
"GetChildrenRequestBytes/zookeeper/quota/stats": 42630,
"GetChildrenRequestBytes/zookeeper/quota/stats/by-path": 52546,
"GetChildrenRequestBytes/zookeeper/quota/stats/global": 50568,
"reads/": 2761,
"reads/configs": 1288,
"reads/configs/server": 1244,
"reads/discovery": 1227,
"reads/discovery/hosts": 1252,
"reads/discovery/services": 1224,
"reads/zookeeper/config": 1285,
"reads/zookeeper/quota/limits": 1228,
"reads/zookeeper/quota/limits/by-path": 1269,
"reads/zookeeper/quota/limits/global": 1230,
"readsBytes/": 38654,
"readsBytes/discovery/services": 39168,
"readsBytes/zookeeper/config": 38550,
"readsBytes/zookeeper/quota/limits": 44208,
"readsBytes/zookeeper/quota/limits/by-path": 55836,
"readsBytes/zookeeper/quota/limits/global": 52890,
"readsBytes/zookeeper/quota/limits/slices": 51815,
"readsBytes/zookeeper/quota/stats": 42630,
"readsBytes/zookeeper/quota/stats/by-path": 52546,
"readsBytes/zookeeper/quota/stats/global": 50568,
"total/readBytes": 655586,
"total/reads": 21251,
"total/writeBytes": 606,
"total/writes": 10,
"writes/": 0,
"writes/configs": 2,
"writes/configs/server": 2,
"writes/discovery": 2,
"writes/discovery/hosts": 2,
"writes/discovery/services": 2,
"writesBytes/": 0,
"writesBytes/configs": 110,
"writesBytes/configs/server": 124,
"writesBytes/discovery": 114,
"writesBytes/discovery/hosts": 126,
"writesBytes/discovery/services": 132

Other relevant endpoints for stats are:

Contributing and Testing

Please see CONTRIBUTING.md.

More tools!

Along with zk-dump and zk-stats-daemon, you can find fle-dump which allows you to inspect FastLeaderElection traffic (i.e.: the protocol by which ZooKeeper decides who will lead and the mechanism by which the leader is subsequently discovered):

do fle-dump --iface eth0 -c
fication(
 timestamp=00:57:12:593254,
 src=10.0.0.1:32938,
 dst=10.0.0.2:3888,
 state=following,
 leader=3,
 zxid=0,
 election_epoch=0,
 peer_epoch=0,
 config=
      server.0=10.0.0.1:2889:3888:participant;0.0.0.0:2181
      server.1=10.0.0.2:2889:3888:participant;0.0.0.0:2181
      server.2=10.0.0.3:2889:3888:participant;0.0.0.0:2181
      server.3=10.0.0.4:2889:3888:participant;0.0.0.0:2181
      server.4=10.0.0.5:2889:3888:participant;0.0.0.0:2181
      version=10010d4d6

fication(
 timestamp=00:57:12:595525,
 src=10.0.0.2:3888,
 dst=10.0.0.1:32938,
 state=looking,
 leader=1,
 zxid=4296326153,
 election_epoch=1,
 peer_epoch=1,
 config=
      server.0=10.0.0.1:2889:3888:participant;0.0.0.0:2181
      server.1=10.0.0.2:2889:3888:participant;0.0.0.0:2181
      server.2=10.0.0.3:2889:3888:participant;0.0.0.0:2181
      server.3=10.0.0.4:2889:3888:participant;0.0.0.0:2181
      server.4=10.0.0.5:2889:3888:participant;0.0.0.0:2181
      version=10010d4d6


Note: for initial messages to be visible you'll need the patch available at ZOOKEEPER-2098, if you are using ZooKeeper prior to ZooKeeper 3.5.1-rc2.

Note: if you are using Linux 3.14 or later, you'll need to disable TCP Auto Corking by running echo 0 > /proc/sys/net/ipv4/tcp_autocorking.

If you are interested in debugging ZAB (ZooKeeper Atomic Broadcast protocol), you can use zab-dump:

do zab-dump --iface eth0

est(
d=6,
=10.0.0.1:2889,
gth=112,
_type=CreateRequest,
sion_id=0x34e4d23b0d70001,
=10.0.0.2:48604,
estr=22:54:31:995353,
d=-1,

osal(
d=6,
=10.0.0.2:48603,
gth=110,
sion_id=0x34e4d23b0d70001,
=10.0.0.1:2889,
estr=22:54:31:995753,
_time=1435816471995,
_type=CreateRequest,
_zxid=8589934619,
d=8589934619,

osal(
d=6,
=10.0.0.1:48604,
gth=110,
sion_id=0x34e4d23b0d70001,
=10.0.0.1:2889,
estr=22:54:31:995755,
_time=1435816471995,
_type=CreateRequest,
_zxid=8589934619,
d=8589934619,

osal(
d=6,
=10.0.0.3:48605,
gth=110,
sion_id=0x34e4d23b0d70001,
=10.0.0.1:2889,
estr=22:54:31:995770,
_time=1435816471995,
_type=CreateRequest,
_zxid=8589934619,
d=8589934619,


=10.0.0.1:2889,
gth=20,
=10.0.0.1:48603,
estr=22:54:31:996068,
d=8589934619,


=10.0.0.1:2889,
gth=20,
=10.0.0.1:48604,
estr=22:54:31:996316,
d=8589934619,


=10.0.0.1:2889,
gth=20,
=10.0.0.1:48604,
estr=22:54:31:996318,
d=8589934619,

it(
=10.0.0.1:48603,
gth=20,
=10.0.0.1:2889,
estr=22:54:31:996193,
d=8589934619,

it(
=10.0.0.2:48604,
gth=20,
=10.0.0.1:2889,
estr=22:54:31:996195,
d=8589934619,

it(
=10.0.0.2:48605,
gth=20,
=10.0.0.1:2889,
estr=22:54:31:996442,
d=8589934619,


OS X

Although no one has tried running this on OS X in production, it can be used for some parts of development and unit testing. If you are running on OS X, please run the following to install the correct dependencies:

p install -r ./osx_requirements.txt
Dependencies

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.