LPM-HMS/PvKey

Name: PvKey

Owner: LPM and Collaborators

Description: Somatic variants calling

Created: 2014-03-03 14:13:18.0

Updated: 2016-03-30 20:55:59.0

Pushed: 2014-09-21 11:32:06.0

Homepage:

Size: 9740

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

PvKey

PvKey is a pipeline that work with Tumor-Normal matched samples. It calls somatic variants using Mutect and structural variants using SVDetect. It can handle genome, exome and targeted samples (TruSeq Custom Amplicon). It is implemented and made possible by the Cosmos workflow management system. Components include:

Configuration

PvKey is configured in wga_settings.py where it points to the correct paths to the GATK bundle, reference genome, and binaries

Note: on Orchestra the files are placed in the right order, and the WGA directory is available currently under /groups/cbi/02.Public.data/WGA/, it will be moved to /groups/lpm/WGA.

Usage

Inside the PvKey directory, execute:

cli -h

BWA aln + GATK Data Preprocessing + Mutect + SVDetect

.. code-block:: json

[
    {
        'chunk': 001,
        'library': 'LIB-1216301779A',
        'platform': 'ILLUMINA',
        'platform_unit': 'C0MR3ACXX.001', 
        'sample_name': 'BC18-06-2013_LyT_S5_L001',
        'rgid': 'BC18-06-2013',
        'pair': 0, #0 or 1
        'path': '/path/to/fastq',
        'sample_tye' : 'tumor' or 'normal'
    },
    {..}
]

Note: If you are working on target resequencing data generated with TruSeq Custom Amplicon assay, add -target True (mark duplicates will not be performed because all the reads are duplicates)

Download data from a bucket S3

Note: It requires boto plugin

Download from BaseSpace

This python script interact with the ILLUMINA repository of ngs data (BaseSpace) to download all the sequenced sample within a project. To make it work you have to import BaseSpacePy. https://github.com/basespace/basespace-python-sdk.git

BaseSpacePy is a Python based SDK to be used in the development of Apps and scripts for working with Illumina's BaseSpace cloud-computing solution for next-gen sequencing data analysis. The primary purpose of the SDK is to provide an easy-to-use Python environment enabling developers to authenticate a user, retrieve data, and upload data/results from their own analysis to BaseSpace.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.