fluent/fluent-plugin-grok-parser

Name: fluent-plugin-grok-parser

Owner: Fluentd: Unified Logging Layer

Description: Fluentd's Grok parser

Created: 2014-06-30 02:14:25.0

Updated: 2017-12-25 22:46:11.0

Pushed: 2017-07-03 02:22:04.0

Homepage: null

Size: 102

Language: Ruby

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Grok Parser for Fluentd Build Status

This is a Fluentd plugin to enable Logstash's Grok-like parsing logic.

Requirements

| fluent-plugin-grok-parser | fluentd | ruby | |—————————|————|——–| | >= 1.0.0 | >= v0.14.0 | >= 2.1 | | < 1.0.0 | >= v0.12.0 | >= 1.9 |

What's Grok?

Grok is a macro to simplify and reuse regexes, originally developed by Jordan Sissel.

This is a partial implementation of Grok's grammer that should meet most of the needs.

How It Works

You can use it wherever you used the format parameter to parse texts. In the following example, it extracts the first IP address that matches in the log.

rce>
ype tail
th /path/to/log
g grokked_log
arse>
@type grok
grok_pattern %{IP:ip_address}
parse>
urce>

For Fluentd v0.12, use following style:

rce>
ype tail
th /path/to/log
g grokked_log
rmat grok
ok_pattern %{IP:ip_address}
urce>

If you want to try multiple grok patterns and use the first matched one, you can use the following syntax:

rce>
ype tail
th /path/to/log
g grokked_log
arse>
@type grok
<grok>
  pattern %{COMBINEDAPACHELOG}
  time_format "%d/%b/%Y:%H:%M:%S %z"
</grok>
<grok>
  pattern %{IP:ip_address}
</grok>
<grok>
  pattern %{GREEDYDATA:message}
</grok>
parse>
urce>

For Fluentd v0.12, use following style:

rce>
ype tail
th /path/to/log
g grokked_log
rmat grok
rok>
pattern %{COMBINEDAPACHELOG}
time_format "%d/%b/%Y:%H:%M:%S %z"
grok>
rok>
pattern %{IP:ip_address}
grok>
rok>
pattern %{GREEDYDATA:message}
grok>
urce>
Multiline support

You can parse multiple line text.

rce>
ype tail
th /path/to/log
g grokked_log
arse>
@type multiline_grok
grok_pattern %{IP:ip_address}%{GREEDYDATA:message}
multiline_start_regexp /^[^\s]/
parse>
urce>

For Fluentd v0.12, use following style:

rce>
ype tail
th /path/to/log
rmat multiline_grok
ok_pattern %{IP:ip_address}%{GREEDYDATA:message}
ltiline_start_regexp /^[^\s]/
g grokked_log
urce>

You can use multiple grok patterns to parse your data.

rce>
ype tail
th /path/to/log
g grokked_log
arse>
@type multiline_grok
<grok>
  pattern Started %{WORD:verb} "%{URIPATH:pathinfo}" for %{IP:ip} at %{TIMESTAMP_ISO8601:timestamp}\nProcessing by %{WORD:controller}#%{WORD:action} as %{WORD:format}%{DATA:message}Completed %{NUMBER:response} %{WORD} in %{NUMBER:elapsed} (%{DATA:elapsed_details})
</grok>
parse>
urce>

For Fluentd v0.12, use following style:

rce>
ype tail
th /path/to/log
rmat multiline_grok
rok>
pattern Started %{WORD:verb} "%{URIPATH:pathinfo}" for %{IP:ip} at %{TIMESTAMP_ISO8601:timestamp}\nProcessing by %{WORD:controller}#%{WORD:action} as %{WORD:format}%{DATA:message}Completed %{NUMBER:response} %{WORD} in %{NUMBER:elapsed} (%{DATA:elapsed_details})
grok>
g grokked_log
urce>

Fluentd accumulates data in the buffer forever to parse complete data when no pattern matches.

You can use this parser without multiline_start_regexp when you know your data structure perfectly.

Configurations

time_format

The format of the time field.

grok_pattern

The pattern of grok. You cannot specify multiple grok pattern with this.

custom_pattern_path

Path to the file that includes custom grok patterns

grok_failure_key

The key has grok failure reason. Default is nil.

rce>
ype dummy
abel @dummy
mmy [
{ "message1": "no grok pattern matched!", "prog": "foo" },
{ "message1": "/", "prog": "bar" }

g dummy.log
urce>

el @dummy>
ilter>
@type parser
key_name message1
reserve_data true
reserve_time true
<parse>
  @type grok
  grok_failure_key grokfailure
  <grok>
    pattern %{PATH:path}
  </grok>
</parse>
filter>
atch dummy.log>
@type stdout
match>
bel>

This generates following events:

-11-28 13:07:08.009131727 +0900 dummy.log: {"message1":"no grok pattern matched!","prog":"foo","message":"no grok pattern matched!","grokfailure":"No grok pattern matched"}
-11-28 13:07:09.010400923 +0900 dummy.log: {"message1":"/","prog":"bar","path":"/"}

grok/pattern

Section for grok patterns. You can use multiple grok patterns with multiple <grok> sections.

k>
ttern %{IP:ipaddress}
ok>

multiline_start_regexp

The regexp to match beginning of multiline. This is only for “multiline_grok”.

How to write Grok patterns

Grok patterns look like %{PATTERN_NAME:name} where “:name” is optional. If “name” is provided, then it becomes a named capture. So, for example, if you have the grok pattern

} %{HOST:host}

it matches

0.0.1 foo.example

but only extracts “foo.example” as {“host”: “foo.example”}

Please see patterns/* for the patterns that are supported out of the box.

How to add your own Grok pattern

You can add your own Grok patterns by creating your own Grok file and telling the plugin to read it. This is what the custom_pattern_path parameter is for.

rce>
ype tail
th /path/to/log
arse>
@type grok
grok_pattern %{MY_SUPER_PATTERN}
custom_pattern_path /path/to/my_pattern
parse>
urce>

custom_pattern_path can be either a directory or file. If it's a directory, it reads all the files in it.

FAQs
1. How can I convert types of the matched patterns like Logstash's Grok?

Although every parsed field has type string by default, you can specify other types. This is useful when filtering particular fields numerically or storing data with sensible type information.

The syntax is

_pattern %{GROK_PATTERN:NAME:TYPE}...

e.g.,

_pattern %{INT:foo:integer}

Unspecified fields are parsed at the default string type.

The list of supported types are shown below:

For the time and array types, there is an optional 4th field after the type name. For the “time” type, you can specify a time format like you would in time_format.

For the “array” type, the third field specifies the delimiter (the default is “,“). For example, if a field called “item_ids” contains the value “3,4,5”, types item_ids:array parses it as [“3”, “4”, “5”]. Alternatively, if the value is “Adam|Alice|Bob”, types item_ids:array:| parses it as [“Adam”, “Alice”, “Bob”].

Here is a sample config using the Grok parser with in_tail and the types parameter:

rce>
ype tail
th /path/to/log
rmat grok
ok_pattern %{INT:user_id:integer} paid %{NUMBER:paid_amount:float}
g payment
urce>
Notice

If you want to use this plugin with Fluentd v0.12.x or earlier, you can use this plugin version v1.0.0.

See also: Plugin Management | Fluentd

License

Apache 2.0 License


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.