apertium/streamparser

Name: streamparser

Owner: Apertium

Description: Python library to parse Apertium stream format

Created: 2014-12-15 23:27:22.0

Updated: 2018-05-13 22:38:33.0

Pushed: 2018-05-13 22:39:45.0

Homepage:

Size: 78

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Apertium Streamparser

Build Status Coverage Status PyPI PyPI - Python Version PyPI - Implementation

Python 3 library to parse Apertium stream format, generating LexicalUnits.

Installation

Streamparser is available through PyPi:

$ pip install apertium-streamparser
$ apertium-streamparser
$^vino/vino<n><m><sg>/venir<vblex><ifi><p3><sg>$
[[SReading(baseform='vino', tags=['n', 'm', 'sg'])], [SReading(baseform='venir', tags=['vblex', 'ifi', 'p3', 'sg'])]]

Installation through PyPi will also install the streamparser module.

Usage
As a library
With string input
from streamparser import parse
lexical_units = parse('^hypercholesterolemia/*hypercholesterolemia$\[\]\^\$[^ignoreme/yesreally$]^a\/s/a\/s<n><nt>$^vino/vino<n><m><sg>/venir<vblex><ifi><p3><sg>$.eefe^dímelo/decir<vblex><imp><p2><sg>+me<prn><enc><p1><mf><sg>+lo<prn><enc><p3><nt>/decir<vblex><imp><p2><sg>+me<prn><enc><p1><mf><sg>+lo<prn><enc><p3><m><sg>$')
for lexical_unit in lexical_units:
    print('%s (%s) ? %s' % (lexical_unit.wordform, lexical_unit.knownness, lexical_unit.readings))
hypercholesterolemia (<class 'streamparser.unknown'>) ? [[SReading(baseform='*hypercholesterolemia', tags=[])]]
a\/s (<class 'streamparser.known'>) ? [[SReading(baseform='a\\/s', tags=['n', 'nt'])]]
vino (<class 'streamparser.known'>) ? [[SReading(baseform='vino', tags=['n', 'm', 'sg'])], [SReading(baseform='venir', tags=['vblex', 'ifi', 'p3', 'sg'])]]
dímelo (<class 'streamparser.known'>) ? [[SReading(baseform='decir', tags=['vblex', 'imp', 'p2', 'sg']), SReading(baseform='me', tags=['prn', 'enc', 'p1', 'mf', 'sg']), SReading(baseform='lo', tags=['prn', 'enc', 'p3', 'nt'])], [SReading(baseform='decir', tags=['vblex', 'imp', 'p2', 'sg']), SReading(baseform='me', tags=['prn', 'enc', 'p1', 'mf', 'sg']), SReading(baseform='lo', tags=['prn', 'enc', 'p3', 'm', 'sg'])]]
With file input
from streamparser import parse_file
lexical_units = parse_file(open('~/Downloads/analyzed.txt'))
for lexical_unit in lexical_units:
    print('%s (%s) ? %s' % (lexical_unit.wordform, lexical_unit.knownness, lexical_unit.readings))
Høgre (<class 'streamparser.known'>) ? [[SReading(baseform='Høgre', tags=['np'])], [SReading(baseform='høgre', tags=['n', 'nt', 'sp'])], [SReading(baseform='høg', tags=['un', 'sint', 'sp', 'comp', 'adj'])], [SReading(baseform='høgre', tags=['f', 'n', 'ind', 'sg'])], [SReading(baseform='høgre', tags=['f', 'n', 'ind', 'sg'])], [SReading(baseform='høgre', tags=['sg', 'nt', 'ind', 'posi', 'adj'])], [SReading(baseform='høgre', tags=['mf', 'sg', 'ind', 'posi', 'adj'])], [SReading(baseform='høgre', tags=['un', 'ind', 'pl', 'posi', 'adj'])], [SReading(baseform='høgre', tags=['un', 'def', 'sp', 'posi', 'adj'])]]
kolonne (<class 'streamparser.known'>) ? [[SReading(baseform='kolonne', tags=['m', 'n', 'ind', 'sg'])], [SReading(baseform='kolonne', tags=['m', 'n', 'ind', 'sg'])]]
Grunnprinsipp (<class 'streamparser.known'>) ? [[SReading(baseform='grunnprinsipp', tags=['n', 'nt', 'ind', 'sg'])], S[Reading(baseform='grunnprinsipp', tags=['n', 'nt', 'pl', 'ind'])], [SReading(baseform='grunnprinsipp', tags=['n', 'nt', 'ind', 'sg'])], [SReading(baseform='grunnprinsipp', tags=['n', 'nt', 'pl', 'ind'])]]
7 (<class 'streamparser.known'>) ? [[SReading(baseform='7', tags=['qnt', 'pl', 'det'])]]
px (<class 'streamparser.unknown'>) ? []
From the terminal
With standard input
cat ~/corpora/nnclean2.txt.bz2 | apertium-deshtml | lt-proc -we /usr/share/apertium/apertium-nno/nno.automorf.bin | python3 streamparser.py
eading(baseform='Høgre', tags=['np'])],
eading(baseform='høgre', tags=['n', 'sp', 'nt'])],
eading(baseform='høg', tags=['un', 'sp', 'adj', 'comp', 'sint'])],
eading(baseform='høgre', tags=['n', 'f', 'ind', 'sg'])],
eading(baseform='høgre', tags=['n', 'f', 'ind', 'sg'])],
eading(baseform='høgre', tags=['posi', 'ind', 'adj', 'nt', 'sg'])],
eading(baseform='høgre', tags=['posi', 'ind', 'adj', 'mf', 'sg'])],
eading(baseform='høgre', tags=['posi', 'ind', 'adj', 'un', 'pl'])],
eading(baseform='høgre', tags=['posi', 'def', 'sp', 'adj', 'un'])]]
eading(baseform='kolonne', tags=['n', 'm', 'ind', 'sg'])],
eading(baseform='kolonne', tags=['n', 'm', 'ind', 'sg'])]]

With file input in terminal
cat ~/corpora/nnclean2.txt.bz2 | apertium-deshtml | lt-proc -we /usr/share/apertium/apertium-nno/nno.automorf.bin > analyzed.txt
thon3 streamparser.py analyzed.txt
eading(baseform='Høgre', tags=['np'])],
eading(baseform='høgre', tags=['n', 'sp', 'nt'])],
eading(baseform='høg', tags=['un', 'sp', 'adj', 'comp', 'sint'])],
eading(baseform='høgre', tags=['n', 'f', 'ind', 'sg'])],
eading(baseform='høgre', tags=['n', 'f', 'ind', 'sg'])],
eading(baseform='høgre', tags=['posi', 'ind', 'adj', 'nt', 'sg'])],
eading(baseform='høgre', tags=['posi', 'ind', 'adj', 'mf', 'sg'])],
eading(baseform='høgre', tags=['posi', 'ind', 'adj', 'un', 'pl'])],
eading(baseform='høgre', tags=['posi', 'def', 'sp', 'adj', 'un'])]]
eading(baseform='kolonne', tags=['n', 'm', 'ind', 'sg'])],
eading(baseform='kolonne', tags=['n', 'm', 'ind', 'sg'])]]

Contributing

Streamparser uses TravisCI for continous integration. Locally, use make test to run the same checks it does. Use pip install -r requirements.txt to install the requirements required for development, e.g. linters.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.