fsprojects/Zander

Name: Zander

Owner: F# Community Project Incubation Space

Description: Regular expression types for matrix information. I.e. parse structured blocks of information from csv or excel files (or similar 2d matrixes)

Created: 2015-09-17 15:25:31.0

Updated: 2018-03-06 11:56:06.0

Pushed: 2018-03-06 08:34:10.0

Homepage:

Size: 136

Language: F#

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Zander Build status Build Status

Named after the fish: Zander. It's a small library to ease with parsing structured blocks of information within a 2-dimensional matrix of information. Typically you get this sort of information from report generators. You might still want to extract this information programmatically, thus the need for the fish.

What problem does this library solve?

When you have data in a structured format, but with different blocks of information. A very simple example is the following:

     Report Title   16/09/15 16:17Page: 1
Company AB           
Some text           
that goes on and explains the report           
 Id ValueType  Attribute 1 Attribute 2  
 1244 25A       
 1244 25B  255 155  
 1244 25C       
 1250 25B  255 100  
 1250 25C       
      Report Title   16/09/15 16:17Page: 2
Company AB           
Some text           
that goes on and explains the report           
 Id ValueType  Attribute 1 Attribute 2  
 1251 25A  255    
 1251 25B    130  
 1251 25C       
 1260 25A       
 1260 25B  255 15  
 1260 25C      

But the structure of the block layout might change from “page” to “page”.

How do you match?
Match columns
Match rows

In order to match rows you supply the row specification with a name by postfixing with ` : title If you want the row to match many rows with the same format you add a '+' : `` : title+```

How does it look?

How do you use this library to extract the information above? You use the parser builder:

g Zander;

 parsed = new BlockEx( @" _          _ _ _ _ _ ""Report Title"" _  _  _  @Time @Page : report_title
                            ""Company AB"" _ _ _ _ _ _                _ _ _ _  _          : company
                                @Text      _ _ _ _ _ _                _ _ _ _  _          : text+
                              _         Id _  Value  Type _ _ ""Attribute 1"" _ ""Attribute 2"" _  _ : header
                              _        @Id _ @Value @Type _ _ (@Attribute1|_) _ (@Attribute2|_) _  _ : row+
                ")
            .Matches(arrayOfArrays);

This will give you structured information that will be easy to consume.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.