youzan/systemtap-toolkit

Name: systemtap-toolkit

Owner: ??

Description: YouZan systemtap toolkit to online analyze on production

Created: 2016-11-09 03:21:25.0

Updated: 2018-05-11 00:11:31.0

Pushed: 2017-04-14 13:27:37.0

Homepage:

Size: 80

Language: Perl

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

systemtap-toolkit


NAME

systemtap-toolkit

Description

This is @YouZan systemtap toolkit to online analyze the complicated problem on production with heavy load. All tools are based on the amazing linux tracing/probing tool systemtap.

Any guys which want to know what the hell it is in the user space and kernel space should be to learn systemtap which is awesome tool:)

Table of Contents

requirements

We need systemtap and dwarf. some scripts are working on kernel space and other is working on the user space.

For kernel space, we need kernel debuginfo like kernel-debuginfo-3.10.0-327.28.3.el7.x86_64.

For user space, we need user application debuginfo like redis-debuginfo-2.8.19-2.el7.x86_64.

For redhat* linux version, we can install as the following:

install yum-utils #for debuginfo-install
install systemtap
install kernelname-devel-version
ginfo-install kernelname-version

contribute

You can choose the ways as the following to help this project.

  1. To contribute to this project, clone this repo locally and commit your code on a separate branch.
  2. Create Github issues.
  3. You can reach me detailyang@gmail.com.

thanks

Special thanks to @brendangregg? @agentzh and @fche. All we have learn for systemtap is from their amazing blog posts and projects:)

tcp-passive-syn-ack-time

It's used to measure the time of syn packet to ack packet on the server side in the tcp-3-shakehands(Thanks tcpguide).

t@localhost tmp]# ./tcp-passive-syn-ack-time -p 80 -t 5000
ecting tcp dport (80)...syn-ack time

rval min:197us, max:858us avg:519us, cnt:3
e |-------------------------------------------------- count
2 |                                                   0
4 |                                                   0
8 |@                                                  1
6 |@                                                  1
2 |@                                                  1
4 |                                                   0
8 |                                                   0

tcp-active-syn-ack-time

It's used to measure the time of syn packet to ack packet on the client side in the tcp-3-shakehands(Thanks tcpguide).

t@localhost systemtap-toolkit]# ./tcp-active-syn-ack-time -p 80 -t 5000
ecting tcp dport (80)...syn-ack time

t:80 min:417us, max:542us avg:460us, cnt:3
e |-------------------------------------------------- count
4 |                                                   0
8 |                                                   0
6 |@@                                                 2
2 |@                                                  1
4 |                                                   0
8 |                                                   0

tcp-retrans

It's used to collecting which tcp packet being retransmit

t@localhost systemtap-toolkit]# ./tcp-retrans
ting tcp retransmission

.2.15:49896 -> 172.17.9.41:80 state:TCP_SYN_SENT rto:0 -> 1000 ms
.2.15:49896 -> 172.17.9.41:80 state:TCP_SYN_SENT rto:1000 -> 2000 ms
.2.15:49896 -> 172.17.9.41:80 state:TCP_SYN_SENT rto:2000 -> 4000 ms

who-open-file

It's used to find who is opening the specified file

t@localhost systemtap-toolkit]# ./who-open-file -f 123 -t 10000
ecting who is opening filename 123

13740) is opening the filename: "123"
13741) is opening the filename: "123"

who-ctxswitch-process

Tracing context switch for specified process.

t@localhost systemtap-toolkit]# ./who-ctxswitch-process -p 6354
ecting who is context switch 6354
swapper/0       (    0)<R>           => nginx           ( 6354)<R>
nginx           ( 6354)<S>           => nginx           ( 6355)<R>
nginx           ( 6355)<D>           => nginx           ( 6354)<R>
nginx           ( 6354)<S>           => rcu_sched       (   10)<R>
nginx           ( 6355)<D>           => nginx           ( 6354)<R>

syscall-connect

It's used to tracing syscall.connect

et(8062) is connecting to AF_INET@192.168.33.10:1800
et(8063) is connecting to AF_INET@192.168.33.10:1800
et(8064) is connecting to AF_INET@192.168.33.10:1800
et(8065) is connecting to AF_INET@192.168.33.10:1800
et(8066) is connecting to AF_INET@192.168.33.10:1800
et(8067) is connecting to AF_INET@192.168.33.10:1800
et(8068) is connecting to AF_INET@192.168.33.10:1800
et(8069) is connecting to AF_INET@192.168.33.10:1800
et(8070) is connecting to AF_INET@192.168.33.10:1800

sample-bt

It's from agentzh and be used to sampling the backtrace in the user space and kernel space.

sample-bt -p 8736 -t 5 -u > a.bt
ING: Tracing 8736 (/opt/nginx/sbin/nginx) in user-space only...
ING: Missing unwind data for module, rerun with 'stap -d stap_df60590ce8827444bfebaf5ea938b5a_11577'
ING: Time's up. Quitting now...(it may take a while)
ING: Number of errors: 0, skipped probes: 24

watch-var

It's used to monitor function param changing.

t@localhost systemtap-toolkit]# ./watch-var  -f syscall.open -v filename -p 25849
ING: Tracing vars syscall.open filename in 25849...
t[25849] kernel.function("SyS_open@fs/open.c:1036").call filename: "" => ""./test""

tcp-trace-packet

Like tcpdump, it's used to tracing tcp packet with more detail include tcp flag.

t@localhost systemtap-toolkit]# ./tcp-trace-packet
ING: tracking 0 tcp packet
067249998698 10.0.2.15:22 => 10.0.2.2:50627 len:92 SYN:0 ACK:1 FIN:0 RST:0 PSH:1 URG:0
067249998955 10.0.2.2:50627 <= 10.0.2.15:22 len:40 SYN:0 ACK:1 FIN:0 RST:0 PSH:0 URG:0
067250199252 10.0.2.15:22 => 10.0.2.2:50627 len:172 SYN:0 ACK:1 FIN:0 RST:0 PSH:1 URG:0
067250199559 10.0.2.2:50627 <= 10.0.2.15:22 len:40 SYN:0 ACK:1 FIN:0 RST:0 PSH:0 URG:0
067250399756 10.0.2.15:22 => 10.0.2.2:50627 len:100 SYN:0 ACK:1 FIN:0 RST:0 PSH:1 URG:0
067250399963 10.0.2.2:50627 <= 10.0.2.15:22 len:40 SYN:0 ACK:1 FIN:0 RST:0 PSH:0 URG:0

ngx-req-watch

It tracing the userland, which can watch and filter by specified condition nginx request in real time

t@localhost systemtap-toolkit]# ./ngx-req-watch -p 5614
ING: watching /opt/tengine/sbin/nginx(8521 8522 8523 8524) requests
x(8523) GET URI:/123?a=123 HOST:127.0.0.1 STATUS:200 FROM 127.0.0.1 FD:16 RT: 0ms
x(8523) GET URI:/123?a=123 HOST:127.0.0.1 STATUS:200 FROM 127.0.0.1 FD:16 RT: 0ms
x(8523) GET URI:/123?a=123&b=123 HOST:127.0.0.1 STATUS:200 FROM 127.0.0.1 FD:16 RT: 0ms
x(8523) GET URI:/123?w HOST:127.0.0.1 STATUS:200 FROM 127.0.0.1 FD:16 RT: 0ms
x(8523) GET URI:/123?w HOST:test STATUS:200 FROM 127.0.0.1 FD:16 RT: 0ms
x(8523) GET URI:/123?w=a HOST:test STATUS:200 FROM 127.0.0.1 FD:16 RT: 0ms

stracelike

Like strace. But it's based on the systemtap

t@localhost systemtap-toolkit]# ./stracelike -p 4580 -t 20000
ING: stracing syscall
Oct 29 12:46:19 2016.094410  epoll_wait(16, 0x1e17b40, 512, 100) = 0 <0.100334>
Oct 29 12:46:19 2016.194756  epoll_wait(16, 0x1e17b40, 512, 100) = 0 <0.100227>
Oct 29 12:46:19 2016.295006  epoll_wait(16, 0x1e17b40, 512, 100) = 0 <0.101086>

redis-watch-req

It tracing the userland, which can watch and filter by specified condition redis request in real time

t@localhost systemtap-toolkit]# ./redis-watch-req -p 23261
ING: watching /usr/bin/redis-server(23261) requests
s-server(23261) RT:30(us) REQ: id:2 fd:5 ==> get a #-1 RES: #9
s-server(23261) RT:23(us) REQ: id:2 fd:5 ==> set a #12 RES: #5
s-server(23261) RT:16(us) REQ: id:2 fd:5 ==> get foo #-1 RES: #5

libcurl-watch-req

It traceing the userland, which can watch and filter by specified condition request for softawre which are based on the libcurl like curl and php.

t@localhost systemtap-toolkit]# ./libcurl-watch-req
ING: Tracing libcurl (0) ...
(23759) URL:http://www.google.com RT:448(ms) RTCODE:0
(23767) URL:http://www.facebook.com/asdfasdf RT:596(ms) RTCODE:0
(23769) URL:https://www.facebook.com/asdfasdf RT:902(ms) RTCODE:0

pdomysql-watch-query

It traceing the userland, which can watch and filter by specified condition request for php's pdo mysql driver.

t@localhost systemtap-toolkit]# ./pdomysql-watch-query -l /usr/lib64/php/modules/pdo_mysql.so

ing pdo-mysql (0)
fpm(12896) 172.17.10.196:3306@root: SELECT * from person RT:0(ms) RTCODE:1
fpm(12896) 172.17.10.196:3306@root: SELECT * from person RT:8(ms) RTCODE:1
fpm(12896)172.17.10.196:3306@root: SELECT sleep(5) RT:5012(ms) RTCODE:1

phpredis-watch-req

It traceing the userland, which can trace the php redis request

t@localhost systemtap-toolkit]# ./phpredis-watch-req -l /usr/lib64/php/modules/redis.so

ing phpredis (/usr/lib64/php/modules/redis.so)

17226)<zim_Redis___construct[22us]>
17226)<zim_Redis_connect[113us]>
17226)<zim_Redis_get[157us]>:*2 $3 GET $3 key
17226)<zim_Redis_hGet[563us]>:*3 $4 HGET $3 key $6 ffffff
17226)<zim_Redis_set[617us]>:*3 $3 SET $3 key $4 abcd
17226)<zim_Redis___destruct[12us]>

io-process-top

It traceing io Read|Write with the view of process(pid).

t@localhost systemtap-toolkit]# ./io-process-top  -t 1000
ING: Collecting IO Process Top 10 with interval of 1000ms
    Process Name          Read(KB)   Write(KB)
    redis-server(4510)            3           0
          stapio(28280)           2           0
 systemd-journal(442)             0           0
         systemd(1)               0           0
            sshd(19948)           0           0
    in:imjournal(595)             0           0

net-process-top

It traceing net Send|Recv with the view of process(pid).

t@localhost systemtap-toolkit]# ./net-process-top -t 5000
ING: Collecting Net Process Top 10 with interval of 5000ms
         Process(    0)    dev   Send(PK)   Recv(PK)   Send(KB)   Recv(KB)
           nginx( 7266)     lo     446203          0     144471          0
             wrk(27496)     lo     156601          0      15599          0
       rcu_sched(   10)   eth0          0          1          0          0
            sshd( 6890)   eth0          1          0          0          0

nssdns-watch-question

It tracing the libnss_dns.so for dns query.

t@localhost systemtap-toolkit]# ./nssdns-watch-question -l /usr/lib64/libnss_dns.so. -t 100000
ING: Tracing libnss_dns(/usr/lib64/libnss_dns.so.2) for pid:0

(11786): www.google.com 57994us
(11788): www.facebook.com 57406us
(11790): www.github.com 4203477us

phpfpm-watch-req

It tracing phpfpm request

phpfpm-watch-req -l /opt/php/sbin/php-fpm
ING: Tracing php-fpm for pid(0)
fpm(9665) GET /index.php?&123123=123&f=q (208us)
fpm(9665) GET /index.php?&123123=123&f=q (172us)
fpm(9665) GET /index.php?&123123=123&f=q (154us)
fpm(9665) GET /index.php?&123123=123&f=q (151us)

swoole-redis-watch

It tracing swoole-redis write and read subroutine

oole-redis-watch -l /opt/php/lib/php/extensions/no-debug-non-zts-20131226/swoole.so -t 100000
ING: Tracing swoole.so(/opt/php/lib/php/extensions/no-debug-non-zts-20131226/swoole.so) for pid:0
25927) is writing for 10.200.175.90:6379 to size(41) *3





chment
00.175.90:6379 get reply: integer:710263830
29486) is writing for 10.200.175.90:6379 to size(41) *3





chment
00.175.90:6379 get reply: integer:709720993

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.