Name: pipeline

Owner: NMDP/Be The Match Bioinformatics Research

Description: Consensus assembly and allele interpretation pipeline.

Created: 2014-09-11 16:44:40.0

Updated: 2017-02-03 02:10:52.0

Pushed: 2015-07-09 03:08:22.0


Size: 57351

Language: HTML

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits



This is a prototypical pipeline. Greater detail on how this works, why it's supposed to work, and much, much more can be found in the wiki at the github page.

Build Status


splitter is the master script which figures out what files to work on, and divides the workload up – into one file per core on the machine. It then starts process_fastq on each of these files. Note that it isn't particularly clever about dividing up the workloads. If there are 100 files, and 4 cores, you get 4 files of 25 lines each – regardless of sample size. Still, some parallelization is better than none, right?

USING pipeline


This tool requires a fair number of other tools to be installed. The shell script checks for them, and will try to give you breadcrumbs as to how to solve any of the issues it finds (e.g. where to find bwa).

If your machine/instance has less than 4GB of RAM, you'll want to modify the script so it uses chr6.fa as the reference genomic, install of all_chr.fa. Make sure those files are indexed. This script does not do the indexing for you.

This is a work in progress, but we welcome improvements and questions. Please feel free to file issues in github.

This has been primarily tested on Debian-based Linux systems (Ubuntu 14.04.01 LTS, to be exact). However, we do try to support other Unix-like operating systems, such as Apple OSX. If you run into issues on your distribution of choice, please file an issue on github. OSX, mind you, has some VERY old shell utilities, so it's still a bit prototypical.


Elapsed times, STDOUT and STDERR messages are stored in files named timedata.[0-9]+ – that is, timedata, followed by a numeric string derived from the PID of the splitter.bash script.

All timedata files may safely be removed after run validation.


This tool takes in the output of the NGS pipeline and produces a html report of the results.

Using ngs-validation-report

1) Running with no parameters


2) Running with only an experiment name

h/to/ngs-validation-report -x ex014

3) Running with only an experiment name and an input directory

h/to/ngs-validation-report -x ex014 -d /path/to/directory/of/data

4) Running with the experiment, validated, and observed files.

h/to/ngs-validation-report -l ex00_validated.txt -e ex00_expected.hml -o ex00_observed.txt


??? subject1.html
??? subject2.html
??? ...
??? dashboard.css
??? bootstrap.min.css
??? default.css
??? Chart.js
??? docs.min.js
??? ie-emulation-modes-warning.js
??? ie10-viewport-bug-workaround.js
??? ie8-responsive-file-warning.js
??? init.js
??? jquery.js
??? raphael.js
??? bootstrap.min.js
??? bethematch.jpeg

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.