hurwitzlab/imicrobe-data-loaders

Name: imicrobe-data-loaders

Owner: Hurwitz Lab

Description: Scripts to load UProC results into the iMicrobe database.

Created: 2017-10-25 17:48:07.0

Updated: 2017-11-03 14:24:19.0

Pushed: 2018-01-08 14:57:40.0

Homepage: null

Size: 80

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

iMicrobe Data Loaders (now with muSCOPE)

Scripts to load various data into the iMicrobe and muSCOPE databases.

camera_envo

Load CAMERA metadata.

loader/imicrobe/uproc

Scripts to load UProC results into the iMicrobe database.

imicrobe-load-uproc-results

REPLACED BY loader/imicrobe/uproc Scripts to load UProC results into the iMicrobe database.

Requirements

These scripts require iRODS iCommands, make, a Python 3.6+ interpreter, and ORM classes generated by orminator.

Installation

Use a virtual environment!

thon3.6 -m venv ~/imdl
urce ~/imdl/bin/activate
l) $ git clone https://github.com/hurwitzlab/imicrobe-data-loaders.git
l) $ cd imicrobe-data-loaders
l) $ pip install -r requirements.txt
l) $ write_models -o imicrobe/models.py -u mysql+pymysql://imicrobe:<password>@localhost/imicrobe
l) $ write_models -o muscope/models.py -u mysql+pymysql://imicrobe:<password>@localhost/muscope2
Usage: load UProC results

First copy the UProC results files from TACC to /iplant/home/shared/imicrobe storage. On a TACC system such as Stampede2 run copy_uproc_results_to_iplant_imicrobe.py:

l) $ python loader/imicrobe/uproc/copy_uproc_results_to_iplant_imicrobe.py

Next load the UProC results from /iplant/home/shared/imicrobe into the iMicrobe on myo. The new tables will be created if they do not exist.

l) $ python loader/imicrobe/uproc/load.py
Usage: Load UProC Pfam results

Execute make ils-imicrobe-projects to create a file list of the iRODS iMicrobe project directories. By default the list will be written to the data directory. This step took 83 minutes on a laptop, 4 minutes on myo.

Execute make write-download-command-file to create a file of iget commands suitable for GNU parallel.

Execute make parallel-iget-uproc-results to do the deed. This should take less than an hour. Try -j 100 for fun.

Execute make download-pfam-data to get the necessary Pfam files.

Execute python load_pfam_table.py to load Pfam annotations into the uproc table. This will first drop the uproc table and delete all rows from the sample_to_uproc table.

Run make write-load-sample-to-uproc-command-file to create a file of commands for GNU Parallel. This will also drop and create the sample_to_uproc table.

Run 'make parallel-load-sample-to-uproc' to load the sample_to_uproc table.

Usage: Load UProC KEGG results

If UProC KEGG results need to be copied from TACC to /iplant/ execute make copy-uproc-kegg-results-to-iplant from stampede2.

If the UProC KEGG results have not been copied to myo execute make ils-imicrobe-projects write-download-command-file parallel-iget-uproc-results.

If the UProC KEGG database tables do not exist yet execute make drop-and-create-uproc-kegg-tables.

To load the UProC KEGG database tables execute make write-load-uproc-kegg-tables-command-file followed by make


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.