wtsi-hgi/stag

Name: stag

Owner: Wellcome Trust Sanger Institute - Human Genetics Informatics

Description: Stream Aggregator

Created: 2016-07-18 16:10:34.0

Updated: 2016-08-04 15:48:10.0

Pushed: 2016-08-09 14:39:13.0

Homepage: null

Size: 37

Language: Racket

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Stream Aggregator

stag (stream aggregator) is for filtering and aggregating an input stream (e.g., stdin) linewise – like, say, a generated log – such that the bits that are important to you are picked out. This is done using a simple declarative language (think the bastard child of SQL and AWK).

Usage
stag STATEMENT
stag -f file.stag

Ordinarily, one would provide the stag statement as a command line argument. For readability and/or reproducibility's sake, the stag statement can be loaded from file using the -f option.

By default, stag reads from standard input and would probably be used as part of a Unix pipeline. It can be used to read from different sources, by specifying as such in the stag statement. stag will run until its input's EOF is reached, or it is terminated with Ctrl+D.

When connected to a TTY, output is presented using a text-mode user interface (something akin to top), whereas when outputting to a non-TTY (or when an option specifies batch output), output would be flushed to stdout at a specified time interval (perhaps a default of 2s would be reasonable to match that of watch).

stag expects records to be EOL delimited (i.e., record-per-line), with no additional “decoration” around records or fields. If that is not the case, sed could be interposed to munge the input into the correct format (for example, if parsing a CSV input, sed could strip leading and trailing quotes, if any, and stag's field separator could be set to /"?,"?/).

For a full description of the stag language, please refer to the formal specification. Otherwise, herein follows illustrative examples:

Examples

Let's say your input looks like this:

[Timestamp]  [IP Address]  [Some message]

Then the following:

stag '$2, max($1) as "Latest" when not $2 = "127.0.0.1"'

Would show the latest hit timestamp, by IP address, for any non-local connection. Or, say, if you wanted to see the total hit count bucketed by hour:

stag 'extract_hour($1) as "Hour", sum($2) as "Hits"'

Alternatively, if you have some text file named foo.txt, the following will have the same result as wc -l:

stag 'count($0) from "foo.txt"'

[Think of more/better examples!… Better yet, acquire potential use cases and see if the language is sufficiently expressive to fulfil their requirements…]

License

Copyright (c) 2016 Genome Research Ltd.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.