Name: gcs-tools
Owner: Spotify
Description: GCS support for avro-tools, parquet-tools and protobuf
Created: 2016-09-18 22:21:46.0
Updated: 2018-05-11 19:47:05.0
Pushed: 2018-01-25 05:36:08.0
Size: 34
Language: Java
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
Light weight wrapper that adds Google Cloud Storage (GCS) support to common Hadoop tools, including avro-tools, parquet-tools and proto-tools for Scio's Protobuf in Avro file, so that they can be used from regular workstations or laptops, outside of a Google Compute Engine (GCE) instance.
It uses your existing OAuth2 credentials and allows authentication via a browser.
You can install the tools via our Homebrew tap on Mac.
tap spotify/public
install gcs-avro-tools gcs-parquet-tools gcs-proto-tools
-tools tojson <GCS_PATH>
uet-tools cat <GCS_PATH>
o-tools tojson <GCS_PATH>
Or build them yourself.
assembly
-jar avro-tools/target/scala-2.11/avro-tools-1.8.1.jar tojson <GCS_PATH>
-jar parquet-tools/target/scala-2.11/parquet-tools-1.8.1.jar cat <GCS_PATH>
-jar proto-tools/target/scala-2.11/proto-tools-3.1.0.jar cat <GCS_PATH>
To make avro-tools and parquet-tools work with GCS we need:
GCS connector won't pick up your local gcloud configuration, and instead expects settings in core-site.xml.