Name: ega_script
Owner: ICGC DCC
Description: null
Created: 2017-07-25 23:08:24.0
Updated: 2017-07-27 19:11:42.0
Pushed: 2017-08-15 14:58:38.0
Homepage: null
Size: 62
Language: Python
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
The tool is used to:
The tool needs to talk to two different kinds of git repository to gather the information to realize the above tasks.
Before you can run the tool, you need to configure the tool. The configuration file locates ega_script/conf/conf.yaml
. You may need to change the following two base_path
for ega_audit
and ega_job
respectively.
audit_base_path: "../ega-file-transfer"
job_base_path: ".."
The above default configuration will assume:
ega-file-transfer-to-collab-jtracker
(http://142.1.177.124/jt-hub/ega-file-transfer-to-collab-jtracker), are git clone
to the same folder as the tool scriptega-file-transfer
(https://github.com/icgc-dcc/ega-file-transfer) is also git clone
to the same folder as the tool scriptega-file-transfer
ega-file-transfer-to-collab-2-jtracker
ega-file-transfer-to-collab-3-jtracker
ega-file-transfer-to-collab-4-jtracker
ega-file-transfer-to-collab-5-jtracker
ega-file-transfer-to-collab-jtracker
ega_script
e ega auditing git repository is version controlled, before we can run the tool to generate the jobs and report `to_stage` or `to_remove` files, we also need to set the version of the ega auditing reports in the `conf/conf.yaml`, e.g.,
file_version: “v20170630”
Installing
the source script of the tool
git clone git@github.com:lindaxiang/ega_script.git
you can run `./main.py -h` to get the usage of the tool
usage: main.py [-h] [-c CONF] -t TASK [-p [PROJECT [PROJECT …]]]
[-s [SEQ_STRATEGY [SEQ_STRATEGY ...]]]
EGA-file-to-colllab job generator and auditor
optional arguments: -h, –help show this help message and exit -c CONF, –setting CONF
Specify ega setting file
-t TASK, –task TASK Specify the task -p [PROJECT [PROJECT …]], –project [PROJECT [PROJECT …]]
Specify the project
-s [SEQ_STRATEGY [SEQ_STRATEGY …]], –seq_strategy [SEQ_STRATEGY [SEQ_STRATEGY …]]
Specify the sequencing strategy
unning the tool to generate the jobs
example generating jobs for `RNA-Seq` data of project `CLLE-ES`, do this:
cd ega_script ./main.py -t job -p CLLE-ES -s RNA-Seq
no `project` is specified, the tool will generate the eligible jobs for all the projects which have auditing reports available.
no `seq_strategy` is specified, the tool will generate the eligible jobs for all kinds of seq_strategy which are included in the related auditing reports.
e generated jobs locates in `job_state.backlog` of one of the job repositories which is defined in the `conf/conf.yaml`, you can change the `job folder` if needed:
job_folder: “ega-file-transfer-to-collab-jtracker/ega-file-transfer-to-collab.0.6.jtracker/job_state.backlog”
unning the tool to generate the `to_stage` files
rder to get the list of files which are to be staged to Aspera server by EGA, do this:
cd ega_script ./main.py -t stage
can specify the `project` and `seq_strategy` in order to get the list of files which are only for given sequence strategies and belong to given projects.
tool will generate `to_stage_*.tsv` files under each project. For example:
ega_operation/ ??? BRCA-KR ? ??? to_stage_run.tsv ??? CLLE-ES ? ??? to_stage_run.tsv ??? LICA-FR ? ??? to_stage_run.tsv ??? MALY-DE ? ??? to_stage_analysis.tsv ??? OV-AU ? ??? to_stage_analysis.tsv ??? PACA-AU ? ??? to_stage_analysis.tsv ? ??? to_stage_run.tsv ??? PAEN-AU ? ??? to_stage_analysis.tsv ??? to_remove.tsv
unning tool to generate the `to_remove` files
rder to list all files which can be removed from Aspera server by EGA, do this:
cd ega_script ./main.py -t remove
tool will generate `to_remove.txt` file locating at: `ega-file-transfer/ega_operation/to_remove.tsv`
og information
using the tool to generate the jobs or report the `to_stage` or `to_remove` files, the tool did many QC checks based on the auditing reports, the QC results are logged into the `*.log` files locates:
ega_script/log/ ??? error.log ??? info.log ??? warn.log
are some sample log messages:
2017-07-25 15:12:45,689 - audit.stage - WARNING - LICA-FR::EGAF00000483937 has the same file_md5sum and encrypted_file_md5sum: set(['772febc5f8fea25a9b09e43dd51e43bd']) 2017-07-25 15:12:45,690 - audit.stage - WARNING - LICA-FR::EGAF00000483938 has the same file_md5sum and encrypted_file_md5sum: set(['170588f8a583c2d4fee882fdfcb6133b']) 2017-07-25 15:12:45,690 - audit.stage - WARNING - LICA-FR::EGAF00000483899 has the same file_md5sum and encrypted_file_md5sum: set(['83aed772452945dc994bcfad7edebc3a']) 2017-07-25 15:12:49,248 - audit.stage - WARNING - MALY-DE::EGAF00001592148 has the id inconsistent: ega_analysis_id in audit report version v20170630 2017-07-25 15:12:49,248 - audit.stage - WARNING - MALY-DE::EGAF00001592148 has the id inconsistent: file_name in audit report version v20170630 2017-07-25 15:12:49,248 - audit.stage - WARNING - MALY-DE::EGAF00001592148 has the id inconsistent: encrypted_file_md5sum in audit report version v20170630
uthors
Linda Xiang** - *Initial work*