Name: spark-hdf5
Owner: Lawrence Livermore National Laboratory
Description: A plugin to enable Apache Spark to read HDF5 files
Created: 2016-08-03 18:00:01.0
Updated: 2018-03-10 16:36:23.0
Pushed: 2016-11-17 20:47:27.0
Size: 9290
Language: Scala
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
The plugin can read single-dimensional arrays from HDF5 files.
The following types are supported:
If you are using the sbt-spark-package, the easiest way to use the package is by requiring it from the spark packages website:
pendencies += "LLNL/spark-hdf5:0.0.4"
Otherwise, download the latest release jar and include it on your classpath.
rt gov.llnl.spark.hdf._
df = sqlContext.read.hdf5("path/to/file.h5", "/dataset")
how
You can start a spark repl with the console target:
console
This will fetch all of the dependencies, set up a local Spark instance, and start a Spark repl with the plugin loaded.
The following options can be set:
Key | Default | Description
————-|———|————
extension
| h5
| The file extension of data
chunk size
| 10000
| The maximum number of elements to be read in a single scan
The plugin includes a test suite which can be run through SBT
test
This code was developed at the Lawrence Livermore National Lab (LLNL) and is available under the Apache 2.0 license (LLNL-CODE-699384
)