compbio-UofT/CouGaR

Name: CouGaR

Owner: Computational Biology Lab at the University of Toronto

Description: Mini Chromosome

Forked from: misko/minichr

Created: 2015-05-15 18:38:35.0

Updated: 2017-01-12 22:16:48.0

Pushed: 2016-05-13 23:34:10.0

Homepage: null

Size: 211039

Language: C++

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

CouGaR

This is a tool used to search for complex genomic rearrangements from matched normal/tumour next generation sequencing data. Final output files created by CouGaR can be visualized using CouGaR-viz

Requirements
Running

The CouGaR pipeline consists of 5 main stages

A quick example is as follows

ash 02_cluster_mapability.sh `pwd`/TEST_TCGA_5055```
BAM pre-processing

Because BAM files can be extremely large and are not necessary after discordant clusters and coverage are computed they can be easily preprocessed in this first stage. This is how we were able to run CouGaR on so many TCGA samples with very limited storage space (~6TB). For example you can pre-process the BAM files and then re-run downstream analysis without needing them again (unless you change the way clusters or coverage are computed).

Two scripts have been provided to pre-process BAM files.

first of these grabs the tumor and normal BAM files for a specified TCGA sample (requires a valid TCGA access key) and pre-processes it. The second of these scripts performs the pre-processing operation on local BAM files. In the local case you will need to specify which reference genome [hg18/hg19] is used and also assign a group label to this sample.

prunning
-------
his stage CouGaR computes a first pass over the genome to estimate regions of normal copy-count (these are then removed from further analysis). This HMM has transistion probabilities informed by discordant clusters found in the tumor sample only (normal clusters have been removed). 

 problem formulation and solving
-------
 the HMM pass is complete a flow network is created and solved by cs2 . This solution provides the base contigs for the final IP pass.

roblem formulation and solving
-------
g the contigs identified in the previous step CouGaR computes a somewhat minimal subset needed to adequately explain the observed coverage data.

alization
-------
one of the graph output files with [CouGaR-viz](https://github.com/compbio-UofT/CouGaR-viz)

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.