hortonworks/hive-json

Name: hive-json

Owner: Hortonworks Inc

Description: A rough prototype of a tool for discovering Apache Hive schemas from JSON documents.

Created: 2013-03-12 15:52:05.0

Updated: 2018-01-06 16:24:41.0

Pushed: 2017-02-20 22:48:17.0

Homepage: null

Size: 35

Language: Java

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Hive JSON Schema Finder

This project is a rough prototype that I've written to analyze large collections of JSON documents and discover their Apache Hive schema. I've used it to anaylyze the githubarchive.org's log data.

To build the project, use Maven (3.0.x) from http://maven.apache.org/.

Building the jar:

% mvn package

Run the program:

% bin/find-json-schema *.json.gz

I've uploaded the discovered schema for githubarchive.org to https://gist.github.com/omalley/5125691.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.