CD2H gitForager

wtsi-hgi/GGR-cwl

Name: GGR-cwl

Owner: Wellcome Trust Sanger Institute - Human Genetics Informatics

Description: CWL tools and workflows for GGR

Forked from: Duke-GCB/GGR-cwl

Created: 2017-08-18 10:16:31.0

Updated: 2017-08-18 10:16:34.0

Pushed: 2017-08-18 15:04:11.0

Homepage: null

Size: 508

Language: Python

GitHub Committers

User	Most Recent Commit	# Commits

Other Committers

User	Email	Most Recent Commit	# Commits

README

GGR-cwl

CWL tools and workflows associated with the Genomics of Gene Regulation (GGR) project

GGR pipelines created using the Common Workflow Language draft-3 specification. The workflows are parametrized with values that best suit the GGR samples, but they can be easily tailored for specific needs.

For a detail User Guide to the CWL workflows, please see the wiki.

ChIP-seq:

Pipelines

Steps

01 - Fastq QC step:
- Fastq QC step - SE
- Fastq QC step - PE
02 - Trimming reads step:
- Trimming step - SE
- Trimming step - PE
03 - Mapping step:
- Mapping step - SE
- Mapping step - PE
04 - Peak calling step:
05 - Quantification step

DNase-seq:

Pipelines

Steps

01 - Mapping step:
- 01 - Mapping step - SE
02 - Peak calling step:
- 02 - Peak calling step
03 - Quantification step:
- 03 - Quantification step

RNA-seq:

Pipelines

Steps

00 - Genome files generation for STAR and RSEM:
- 00 - Preprocessing step
01 - Fastq QC step:
- 01 - Fastq QC step - SE
- 01 - Fastq QC step - PE
02 - Trimming reads step:
- 02 - Trimming step - SE
- 02 - Trimming step - PE
03 - Mapping step:
- 03 - Mapping step - SE
- 03 - Mapping step - SE - w/sjdb
- 03 - Mapping step - PE
- 03 - Mapping step - PE - w/sjdb
04 - Quantification step:
- 04 - Quantification step - SE - Unstranded
- 04 - Quantification step - SE - Stranded
- 04 - Quantification step - SE - Revstranded
- 04 - Quantification step - PE - Unstranded
- 04 - Quantification step - PE - Stranded
- 04 - Quantification step - PE - Revstranded

Workflow differences legend

Depending on the experiments, there might be small differences in the workflows which will be determined by:

All
- Type of read:
  - SE: Single End reads
  - PE: Paired-End reads
ChIP-seq & DNase-seq
- Type of region targeted:
  - Narrow: Narrow (also known as Point-Source) peaks. Limited region bound (e.g. TFs).
  - Broad: Broad peaks. Wide region bound (e.g. Histone modifications)
ChIP-seq only
- With or without control. If a control sample is available -with-control or not.
RNA-seq only
- Strand specificity:
  - Unstranded: reads are not strand-specific, is not possible to know from which DNA strand they come.
  - Stranded: reads are strand-specific and can be map to the Watson and Crick strands.
  - Reverse Stranded: reads come from cDNA, which switches the mapping of the forward and reverse strand.
- Custom SJDB: By default the STAR 2-pass mapping strategy is implemented in which a first pass of STAR is run to generate a large pool of novel splice junctions (referred as SJDB). These junctions are used to generate a genome index which is employed in the mapping step. However, this 2-pass strategy can be skipped, using a custom genome index Because typically this genome would be created with a precomputed SJDB, this option is denoted with -with-sjdb.

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.