lucidworks/fusion-log-indexer

Name: fusion-log-indexer

Owner: Lucidworks

Description: Watch a directory for logs and send each line to a Fusion pipeline as a PipelineDocument using grok for parsing.

Created: 2015-12-08 19:09:41.0

Updated: 2017-12-02 06:55:28.0

Pushed: 2017-11-16 07:43:26.0

Homepage: null

Size: 1828

Language: Java

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

fusion-log-indexer

This project offers up a number of tools designed to quickly and efficiently get logs into Fusion. It supports a pluggable Parsing strategy (with implementations for Grok, DNS, JSON and NoOp) as well as a number of preconfigured Grok patterns similar to what is available in Logstash and other engines.

Features

  1. Fast, multithreaded, lightweight client for installation on machines to be monitored
  2. Pluggable log parsing logic with support for a variety of formats, including Grok, JSON and DNS
  3. Ability to watch directories and automatically index new content
  4. Integration with Lucidworks Fusion

Getting Started

Prerequisites
  1. Maven (http://maven.apache.org)
  2. Java 1.7 or later
  3. Lucidworks Fusion
Building

After cloning the repository, do the following on the command line:

  1. mvn package // -DskipTests if you want to skip the tests

The output JAR file is in the target directory

Running
  1. To see all options: `java -jar ./target/fusion-log-indexer-1.0-exe.jar`
Basic Examples
  1. Watches and sends in logs from old Lucidworks Search system in 500 at a time to the my_collection collection using the default pipeline:`java -jar ./target/fusion-log-indexer-1.0-exe.jar -dir ~/projects/content/lucid/lucidfind/logs/<br /> -fusion "http://localhost:8764/api/apollo/index-pipelines/my_collection-default/collections/my_collection/index" -fusionUser USER_HERE -fusionPass PASSWORD_HERE -senderThreads 4 -fusionBatchSize 500 --verbose -lineParserConfig sample-properties/lws-grok-parser.properties`

  2. Nagios example: `java -jar ./target/fusion-log-indexer-1.0-exe.jar -dir ~/projects/content/nagios/<br /> -fusion "http://localhost:8764/api/apollo/index-pipelines/nagios-default/collections/nagios/index" -fusionUser USER -fusionPass PASSWORD -lineParserConfig sample-properties/nagios-grok-parser.properties`

Multi-line Parsing Example

Let's see how to handle parsing of a solr log file that has single-line and multi-line log messages (such as stacktraces). Specifically, we'll see how to parse the following snippet from a log generated by Solr 6.5.1:

  - 2017-06-01 13:58:13.153; [   ] org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null path=/admin/cores params={indexInfo=false&wt=json&_=1496325489976} status=0 QTime=0
  - 2017-06-01 13:58:13.165; [   ] org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null path=/admin/info/system params={wt=json&_=1496325489977} status=0 QTime=12
  - 2017-06-01 13:58:13.169; [   ] org.apache.solr.handler.admin.CollectionsHandler; Invoked Collection Action :list with params action=LIST&wt=json&_=1496325489977 and sendToOCPQueue=true
  - 2017-06-01 13:58:13.170; [   ] org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null path=/admin/collections params={action=LIST&wt=json&_=1496325489977} status=0 QTime=0
R - 2017-06-01 13:58:23.840; [c:gettingstarted s:shard1 r:core_node1 x:gettingstarted_shard1_replica1] org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: undefined field: "notafield"
    at org.apache.solr.schema.IndexSchema.getField(IndexSchema.java:1239)
    at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:438)
    at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:405)
    at org.apache.solr.request.SimpleFacets.lambda$getFacetFieldCounts$0(SimpleFacets.java:803)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at org.apache.solr.request.SimpleFacets$3.execute(SimpleFacets.java:742)
    at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:818)
    at org.apache.solr.handler.component.FacetComponent.getFacetCounts(FacetComponent.java:329)
    at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:273)
    at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:295)
    at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:2440)
    at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
    at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:347)
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:298)
  - 2017-06-01 13:58:23.842; [c:gettingstarted s:shard1 r:core_node1 x:gettingstarted_shard1_replica1] org.apache.solr.core.SolrCore; [gettingstarted_shard1_replica1]  webapp=/solr path=/select params={q=*:*&facet.field=notafield&indent=on&facet=on&wt=json&_=1496325493119} hits=32 status=400 QTime=61

A grok pattern from the resources/patterns/solr file for this log could be:

_651_LOG4J %{LOGLEVEL:level_s} - %{TIMESTAMP_ISO8601:logdate}; \[(?:%{DATA:mdc_s}| )\] %{DATA:category_s}; \[(?:%{DATA:core_s}| )\] %{JAVALOGMESSAGE:logmessage}

NOTE: You don't need to worry about multiple spaces as the parser collapses multiple whitespace characters down to a single space automatically.

Notice that the timestamp in the log has format: yyyy-MM-dd HH:mm:ss.SSS. Consequently, you'll need to set the following property in your log parser properties file:

FieldFormat=yyyy-MM-dd HH:mm:ss.SSS

Lastly, if you want to parse more fields from search requests, you can set the following property:

RequestGrokPattern=%{SOLR_6_REQUEST}

The final parser properties file you'll need to parse the example log above is:

PatternFile=patterns/grok-patterns
Pattern=%{SOLR_651_LOG4J}
601TimestampFieldName=timestamp_tdt
FieldName=logdate
FieldFormat=yyyy-MM-dd HH:mm:ss.SSS
essageFieldName=message_txt_en
RequestGrokPattern=%{SOLR_6_REQUEST}

To parse the example log, save the example log entries above into solr_example/solr.log and then run:

 -jar target/fusion-log-indexer-1.0-exe.jar -dir solr_example \
ineParserClass parsers.SolrLogParser -lineParserConfig solr_log_parser.properties \
arseOnly

Contributing

Please submit a pull request against the master branch with your changes.

Grokking Grok

For Grok, we are using https://github.com/thekrakken/java-grok/ implementation, which is a little thin on documentation. However, there are some useful tools available for learning and working with Grok. Additionally, see the `src/main/resources/patterns` directory for examples ranging from Apache logs to MongoDB to Nagios.

Useful Sites:

  1. http://grokconstructor.appspot.com/do/match
  2. Syntax: http://grokconstructor.appspot.com/RegularExpressionSyntax.txt

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.