spotify/gcs-tools

Name: gcs-tools

Owner: Spotify

Description: GCS support for avro-tools, parquet-tools and protobuf

Created: 2016-09-18 22:21:46.0

Updated: 2018-05-11 19:47:05.0

Pushed: 2018-01-25 05:36:08.0

Homepage:

Size: 34

Language: Java

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

GCS Tools

Build Status GitHub license

Raison d'être:

Light weight wrapper that adds Google Cloud Storage (GCS) support to common Hadoop tools, including avro-tools, parquet-tools and proto-tools for Scio's Protobuf in Avro file, so that they can be used from regular workstations or laptops, outside of a Google Compute Engine (GCE) instance.

It uses your existing OAuth2 credentials and allows authentication via a browser.

Usage:

You can install the tools via our Homebrew tap on Mac.

 tap spotify/public
 install gcs-avro-tools gcs-parquet-tools gcs-proto-tools
-tools tojson <GCS_PATH>
uet-tools cat <GCS_PATH>
o-tools tojson <GCS_PATH>

Or build them yourself.

assembly
 -jar avro-tools/target/scala-2.11/avro-tools-1.8.1.jar tojson <GCS_PATH>
 -jar parquet-tools/target/scala-2.11/parquet-tools-1.8.1.jar cat <GCS_PATH>
 -jar proto-tools/target/scala-2.11/proto-tools-3.1.0.jar cat <GCS_PATH>
How it works:

To make avro-tools and parquet-tools work with GCS we need:

GCS connector won't pick up your local gcloud configuration, and instead expects settings in core-site.xml.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.