hasadna/migdar-data-pipelines

Name: migdar-data-pipelines

Owner: The Public Knowledge Workshop

Description: collecting data about migdar

Created: 2018-02-28 08:15:46.0

Updated: 2018-02-28 08:36:22.0

Pushed: 2018-04-08 07:52:49.0

Homepage: null

Size: 14

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Migdar Data Pipelines

Running the pipelines locally

Most pipelines are available to run locally with minimal infrastructure dependencies.

Install some dependencies (following works for latest version of Ubuntu):

 apt-get install -y python3.6 python3-pip python3.6-dev libleveldb-dev libleveldb1v5
 pip3 install pipenv

install the pipeline dependencies

nv install

activate the virtualenv

nv shell

List the available pipelines


run a pipeline

run <PIPELINE_ID>
running using docker
er pull orihoch/knesset-data-pipelines
er run -it --entrypoint bash -v `pwd`:/pipelines orihoch/knesset-data-pipelines

Continue with Running the pipelines locally section above

You can usually fix permissions problems on the files by running inside the docker chown -R 1000:1000 .

Syncing data from google storage
il -m rsync -r gs://knesset-data-pipelines/hasadna-migdar-data/ori-sync-data ./data
Syncing the data to google storage

Replace with your name - to prevent overwriting each other's data

il -m rsync -r ./data gs://knesset-data-pipelines/hasadna-migdar-data/<YOUR_NAME>-sync-data

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.