racker/gizer

Name: gizer

Owner: racker

Description: Etl engine for MongoDB, prerequisite for racker/caspian-data-access

Created: 2016-04-05 19:29:42.0

Updated: 2017-02-20 20:53:08.0

Pushed: 2017-06-28 18:15:47.0

Homepage:

Size: 605

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Build Status Coverage Status

Intro

Application requires two connections to PostgreSQL instances: one is using for caching purposes and another is real target instance. At least PostgreSQL 9.3 required. This requirement is coming from initial database synchronization (Init load).

Solution is divided into 3 phases:

Environment
Config file.

Connection settings, etc. See sample-config.ini for inspiration.

Tools.
Command line examples

Acquire schema

 latest oplog timestamp before running init load<br>

Run init load part 1 of 2

init load part 2 of 2

When init load finishes save completion status ok/error

fy etl status, if exit code is 1 then run init load<br>

Run following command every time to update postgres database by MongoDB data

now issues.<br>
hema items' types should be strictly defined. Incorrectly defined types may lead to errors.
elds which are not in schema or whose have different types will not be loaded to relational model

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.