twosigma/lancaster

Name: lancaster

Owner: Two Sigma

Description: A python extension wrapper for avro-c

Created: 2016-07-15 15:04:52.0

Updated: 2016-09-14 05:19:27.0

Pushed: 2016-09-04 01:05:27.0

Homepage: null

Size: 62

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

lancaster

A python extension wrapper for avro-c.

Currently only supports reading a stream of avro serialized data. Does not support writing, nor the avro container format. For more details on what lancaster is willing to read, see the lancaster spec.

See also the Avro project page.

Usage
schema = '{ ... }'
with open('data.avro', 'rb') as f:
    data = list(lancaster.read_stream(schema, f))

lancaster.read_stream() accepts a json string describing the schema, and a stream to read from, and returns a generator which will produce python versions of the avro data (dicts, lists, ints, strings).

Installation

A conda package is provided at anaconda.org. This depends on conda packages providing the C libraries required, libsnappy, jansson, and libavro-c.

conda create -n lancaster -c leif python lancaster

You can also install using just pip or setuptools, assuming you have avro-c and libsnappy installed on your system, which you can probably get from your OS package manager. On debian and ubuntu systems, you can install libavro-dev and libsnappy-dev.

pip install lancaster

or

git clone https://github.com/twosigma/lancaster
cd lancaster
python setup.py install
Caveats

Recursive structures (links) are not supported. Writing anything, and the avro container file format, are also not supported. Happy to accept pull requests but I don't need those features personally yet.

License

Copyright 2016 Two Sigma Open Source, LLC. MIT licensed.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.