twitter/torch-thrift

Name: torch-thrift

Owner: Twitter, Inc.

Description: A Thrift codec for Torch

Created: 2016-01-14 01:49:03.0

Updated: 2017-10-03 18:52:45.0

Pushed: 2016-03-07 23:36:38.0

Homepage:

Size: 23

Language: C

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Thrift

A codec based Thrift library for Torch. Supports very fast deserialization of arbitrary Thrift binary data to Lua native types. Also includes serialization of Lua native types back into Thrift binary based on a provided schema.

Reading

Thrift binary data is self descriptive. If you just want to quickly read it and convert it to Lua native types then no schema is required when creating the codec.

l thrift = require 'libthrift'
l codec = thrift.codec()
l binary = io.open('thrift_data.bin', 'r'):read('*all')
l result = codec:read(binary)
t(result)

You can also read Thrift using a schema, this allows for nicer naming of fields.

l thrift = require 'libthrift'
l codec = thrift.codec({
type = "struct",
ields = {
  [1] = { ttype = "i32", name = "an_int" },
  [2] = { ttype = "bool", name = "someBoolean" },
  [3] = { ttype = "list", value = "double", name = "vector" },


l binary = io.open('thrift_data.bin', 'r'):read('*all')
l result = codec:read(binary)
t(result)

It is possible to read directly from a ByteTensor instead of a string using the readTensor function.

Writing

Writing Thrift binary requires a schema as there is no 1:1 mapping of Lua and Thrift types. We support all Thrift types and they can be nested indefinitely.

l thrift = require 'libthrift'
l codec = thrift.codec({
type = "struct",
ields = {
  [1] = "i32",
  [2] = "bool",
  [3] = { ttype = "list", value = "double" },


l binary = codec:write({
2,
rue,
 3.14, 13.13, 543.21 },

pen('thrift_data.bin', 'w'):write(binary)

Just like reading, you can supply field names for better readability.

l thrift = require 'libthrift'
l codec = thrift.codec({
type = "struct",
ields = {
  [1] = { ttype = "i32", name = "an_int" },
  [2] = { ttype = "bool", name = "someBoolean" },
  [3] = { ttype = "list", value = "double", name = "vector" },


l binary = codec:write({
n_int = 42,
omeBoolean = true,
ector = { 3.14, 13.13, 543.21 },

pen('thrift_data.bin', 'w'):write(binary)

It is possible to write directly to a ByteTensor instead of a string using the writeTensor function.

Codec

The schema table passed into a codec during creation has a simple format. We support the following Thrift types.

A more complicated schema can be found below.

l descA = {
type = "struct",
ields = {
  [1] = "i32",
  [2] = "bool",
  [3] = { ttype = "list", value = "double" },


l desc = {
type = "struct",
ields = {
  [1] = { ttype = "map", key = "i32", value = "i32" },
  [2] = { ttype = "map", key = "i64", value = { ttype = "set", value = "string" } },
  [3] = descA,
  [4] = { ttype = "list", value = descA },
  [5] = { ttype = "set", value = descA },
  [7] = { ttype = "map", key = "i16", value = descA },


It corresponds to this Thrift file.

ct A {
: i32 x
: bool y
: list<double> z


ct B {
: map<i32, i32> a
: map<i64, set<string>> b
: A c
: list<A> d
: set<A> e
: map<i16, A> f

Lua and 64 bit integers

Lua 5.1 and earlier uses doubles as its internal number format. That means we can not represent the full range of i64 values natively in the Lua VM. The default behavior is to throw an error when reading or writing any value that would be out of range for either Thrift or for Lua. That works for most cases, however if you need the full range of i64 you can tell the codec to turn i64 values into strings or LongTensors (of size 1) and vice versa on write.

l codec1 = thrift.codec({ i64string = true })  -- i64 to strings
l codec2 = thrift.codec({ i64tensor = true })  -- i64 to LongTensors
Torch Tensors

Most of the time you want to map Thrift lists and sets directly to Torch Tensors. This can happen automatically for you by setting the tensors option to true. The following mapping will occur.

l codec = thrift.codec({ tensors = true })

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.