pinterest/secor

Name: secor

Owner: Pinterest

Description: Secor is a service implementing Kafka log persistence

Created: 2014-04-15 22:26:44.0

Updated: 2018-01-17 09:14:51.0

Pushed: 2018-01-17 17:49:47.0

Homepage:

Size: 1063

Language: Java

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Pinterest Secor

Build Status

Secor is a service persisting Kafka logs to Amazon S3, Google Cloud Storage, Microsoft Azure Blob Storage and Openstack Swift.

Key features
Setup Guide
Get Secor code
clone [git-repo-url] secor
ecor
Customize configuration parameters

Edit src/main/config/*.properties files to specify parameters describing the environment. Those files contain comments describing the meaning of individual parameters.

Create and install jars
 default this will install the "release" (Kafka 0.10 profile)
package
r ${SECOR_INSTALL_DIR} # directory to place Secor binaries in.
-zxvf target/secor-0.1-SNAPSHOT-bin.tar.gz -C ${SECOR_INSTALL_DIR}

 use the Kafka 0.8 client you should use the kafka-0.8-dev profile
-Pkafka-0.8-dev package
Run tests (optional)
{SECOR_INSTALL_DIR}
ripts/run_tests.sh
Run Secor
{SECOR_INSTALL_DIR}
 -ea -Dsecor_group=secor_backup \
log4j.configuration=log4j.prod.properties \
config=secor.prod.backup.properties \
p secor-0.1-SNAPSHOT.jar:lib/* \
m.pinterest.secor.main.ConsumerMain
Output grouping

One of the convenience features of Secor is the ability to group messages and save them under common file prefixes. The partitioning is controlled by a message parser. Secor comes with the following parsers:

If none of the parsers available out-of-the-box is suitable for your use case, note that it is very easy to implement a custom parser. All you have to do is to extend MessageParser and tell Secor to use your parser by setting `secor.message.parser.class` in the properties file.

Output File Formats

Currently secor supports the following output formats

Tools

Secor comes with a number of tools implementing interactions with the environment.

Log file printer

Log file printer displays the content of a log file.

 -ea -Dlog4j.configuration=log4j.prod.properties -Dconfig=secor.prod.backup.properties -cp "secor-0.1-SNAPSHOT.jar:lib/*" com.pinterest.secor.main.LogFilePrinterMain -f s3n://bucket/path
Log file verifier

Log file verifier checks the consistency of log files.

 -ea -Dlog4j.configuration=log4j.prod.properties -Dconfig=secor.prod.backup.properties -cp "secor-0.1-SNAPSHOT.jar:lib/*" com.pinterest.secor.main.LogFileVerifierMain -t topic -q
Partition finalizer

Topic finalizer writes _SUCCESS files to date partitions that very likely won't be receiving any new messages and (optionally) adds the corresponding dates to Hive through Qubole API.

 -ea -Dlog4j.configuration=log4j.prod.properties -Dconfig=secor.prod.backup.properties -cp "secor-0.1-SNAPSHOT.jar:lib/*" com.pinterest.secor.main.PartitionFinalizerMain
Progress monitor

Progress monitor exports offset consumption lags per topic partition to OpenTSDB / statsD. Lags track how far Secor is behind the producers.

 -ea -Dlog4j.configuration=log4j.prod.properties -Dconfig=secor.prod.backup.properties -cp "secor-0.1-SNAPSHOT.jar:lib/*" com.pinterest.secor.main.ProgressMonitorMain

Set monitoring.interval.seconds to a value larger than 0 to run in a loop, exporting stats every monitoring.interval.seconds seconds.

Detailed design

Design details are available in DESIGN.md.

License

Secor is distributed under Apache License, Version 2.0.

Maintainers
Contributors
Companies who use Secor
Help

If you have any questions or comments, you can reach us at secor-users@googlegroups.com


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.