Name: pipeline
Owner: NMDP/Be The Match Bioinformatics Research
Description: Consensus assembly and allele interpretation pipeline.
Created: 2014-09-11 16:44:40.0
Updated: 2017-02-03 02:10:52.0
Pushed: 2015-07-09 03:08:22.0
Size: 57351
Language: HTML
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
This is a prototypical pipeline. Greater detail on how this works, why it's supposed to work, and much, much more can be found in the wiki at the github page.
splitter is the master script which figures out what files to work on, and divides the workload up – into one file per core on the machine. It then starts process_fastq on each of these files. Note that it isn't particularly clever about dividing up the workloads. If there are 100 files, and 4 cores, you get 4 files of 25 lines each – regardless of sample size. Still, some parallelization is better than none, right?
clone the repository to your working directory
edit splitter.bash to include the correct path to your copy of the pipeline note – splitter.bash will not run unless you set this value.
ensure your data is in the format that splitter and process_fastq expect:
run splitter, thusly:
h/to/splitter.bash /path/to/my/compliant/directory/of/data [ DEBUG ]
check the result files in final/
intermediate files are stored in intermediate/ and can safely be deleted after processing has completed
This tool requires a fair number of other tools to be installed. The shell script checks for them, and will try to give you breadcrumbs as to how to solve any of the issues it finds (e.g. where to find bwa).
If your machine/instance has less than 4GB of RAM, you'll want to modify the script so it uses chr6.fa as the reference genomic, install of all_chr.fa. Make sure those files are indexed. This script does not do the indexing for you.
This is a work in progress, but we welcome improvements and questions. Please feel free to file issues in github.
This has been primarily tested on Debian-based Linux systems (Ubuntu 14.04.01 LTS, to be exact). However, we do try to support other Unix-like operating systems, such as Apple OSX. If you run into issues on your distribution of choice, please file an issue on github. OSX, mind you, has some VERY old shell utilities, so it's still a bit prototypical.
Elapsed times, STDOUT and STDERR messages are stored in files named timedata.[0-9]+ – that is, timedata, followed by a numeric string derived from the PID of the splitter.bash script.
All timedata files may safely be removed after run validation.
This tool takes in the output of the NGS pipeline and produces a html report of the results.
1) Running with no parameters
h/to/tool/ngs-validation-report
2) Running with only an experiment name
h/to/ngs-validation-report -x ex014
3) Running with only an experiment name and an input directory
h/to/ngs-validation-report -x ex014 -d /path/to/directory/of/data
4) Running with the experiment, validated, and observed files.
h/to/ngs-validation-report -l ex00_validated.txt -e ex00_expected.hml -o ex00_observed.txt
rt/
index.html
results.html
errors.html
experiments.html
help.html
log.html
subjects/
??? subject1.html
??? subject2.html
??? ...
css/
??? dashboard.css
??? bootstrap.min.css
??? default.css
js/
??? Chart.js
??? docs.min.js
??? ie-emulation-modes-warning.js
??? ie10-viewport-bug-workaround.js
??? ie8-responsive-file-warning.js
??? init.js
??? jquery.js
??? raphael.js
??? bootstrap.min.js
img/
??? bethematch.jpeg