Name: imicrobe-data-loaders
Owner: Hurwitz Lab
Description: Scripts to load UProC results into the iMicrobe database.
Created: 2017-10-25 17:48:07.0
Updated: 2017-11-03 14:24:19.0
Pushed: 2018-01-08 14:57:40.0
Homepage: null
Size: 80
Language: Python
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
Scripts to load various data into the iMicrobe and muSCOPE databases.
Load CAMERA metadata.
Scripts to load UProC results into the iMicrobe database.
REPLACED BY loader/imicrobe/uproc Scripts to load UProC results into the iMicrobe database.
These scripts require iRODS iCommands, make
, a Python 3.6+ interpreter,
and ORM classes generated by orminator.
Use a virtual environment!
thon3.6 -m venv ~/imdl
urce ~/imdl/bin/activate
l) $ git clone https://github.com/hurwitzlab/imicrobe-data-loaders.git
l) $ cd imicrobe-data-loaders
l) $ pip install -r requirements.txt
l) $ write_models -o imicrobe/models.py -u mysql+pymysql://imicrobe:<password>@localhost/imicrobe
l) $ write_models -o muscope/models.py -u mysql+pymysql://imicrobe:<password>@localhost/muscope2
First copy the UProC results files from TACC to /iplant/home/shared/imicrobe
storage. On a TACC system such as Stampede2 run copy_uproc_results_to_iplant_imicrobe.py
:
l) $ python loader/imicrobe/uproc/copy_uproc_results_to_iplant_imicrobe.py
Next load the UProC results from /iplant/home/shared/imicrobe into the iMicrobe on myo. The new tables will be created if they do not exist.
l) $ python loader/imicrobe/uproc/load.py
Execute make ils-imicrobe-projects
to create a file list of the iRODS iMicrobe project directories. By default the list will be written to the data
directory. This step took 83 minutes on a laptop, 4 minutes on myo.
Execute make write-download-command-file
to create a file of iget
commands suitable for GNU parallel
.
Execute make parallel-iget-uproc-results
to do the deed. This should take less than an hour. Try -j 100
for fun.
Execute make download-pfam-data
to get the necessary Pfam files.
Execute python load_pfam_table.py
to load Pfam annotations into the uproc table.
This will first drop the uproc table and delete all rows from the sample_to_uproc table.
Run make write-load-sample-to-uproc-command-file
to create a file of commands for
GNU Parallel. This will also drop and create the sample_to_uproc table.
Run 'make parallel-load-sample-to-uproc' to load the sample_to_uproc table.
If UProC KEGG results need to be copied from TACC to /iplant/ execute make copy-uproc-kegg-results-to-iplant
from stampede2.
If the UProC KEGG results have not been copied to myo execute make ils-imicrobe-projects write-download-command-file parallel-iget-uproc-results
.
If the UProC KEGG database tables do not exist yet execute make drop-and-create-uproc-kegg-tables
.
To load the UProC KEGG database tables execute make write-load-uproc-kegg-tables-command-file
followed by make