CD2H gitForager

LLNL/spark-hdf5

Name: spark-hdf5

Owner: Lawrence Livermore National Laboratory

Description: A plugin to enable Apache Spark to read HDF5 files

Created: 2016-08-03 18:00:01.0

Updated: 2018-03-10 16:36:23.0

Pushed: 2016-11-17 20:47:27.0

Homepage:

Size: 9290

Language: Scala

GitHub Committers

User	Most Recent Commit	# Commits

Other Committers

User	Email	Most Recent Commit	# Commits

README

Spark-HDF5

Progress

The plugin can read single-dimensional arrays from HDF5 files.

The following types are supported:

Int8
UInt8
Int16
UInt16
Int32
Int64
Float32
Float64
Fixed length strings

Setup

If you are using the sbt-spark-package, the easiest way to use the package is by requiring it from the spark packages website:

pendencies += "LLNL/spark-hdf5:0.0.4"

Otherwise, download the latest release jar and include it on your classpath.

Usage

rt gov.llnl.spark.hdf._

df = sqlContext.read.hdf5("path/to/file.h5", "/dataset")
how

You can start a spark repl with the console target:

console

This will fetch all of the dependencies, set up a local Spark instance, and start a Spark repl with the plugin loaded.

Options

The following options can be set:

Key | Default | Description ————-|———|———— extension | h5 | The file extension of data chunk size | 10000 | The maximum number of elements to be read in a single scan

Testing

The plugin includes a test suite which can be run through SBT

test

Roadmap

Use the hdf-obj package rather than the sis-jhdf5 wrapper
Support for multi-dimensional arrays
Support for compound datasets
Additional testing
Partition discovery (data inference based on location)

Release

This code was developed at the Lawrence Livermore National Lab (LLNL) and is available under the Apache 2.0 license (LLNL-CODE-699384)

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.