Name: ctat-mutations
Owner: National Cancer Informatics Program
Description: null
Created: 2018-05-08 19:23:15.0
Updated: 2018-05-11 17:51:03.0
Pushed: 2018-05-08 19:36:59.0
Homepage: null
Size: 75
Language: Python
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
Mutation detection in RNA-Seq highlights the GATK Best Practices in RNA-Seq variant calling, several sources of variant annotation, and filtering based on CRAVAT.
This tool, as well as others, is installed at Indiana University on a publically available Galaxy instance free to use for cancer research. Please register and use this service (https://galaxy.ncgas-trinity.indiana.edu/user/login). To run the tool locally continue reading.
Download the resource files found at https://data.broadinstitute.org/Trinity/CTAT/mutation/mutation_resources.tar.gz
Download demo data at https://data.broadinstitute.org/Trinity/CTAT/mutation/demo_data.tar.gz
You will need the following files ( found in the resource bundle ):
Either put the rnaseq_mutation_pipeline in your path or go to its directory.
Use the following command: Note: values in {} will need to be updated with actual files / options.
on rnaseq_mutation_pipeline.py \
--reference {reference_genome.fa} \
--vcf {reference_genome.vcf} \
--left {left_sample.fa} \
--right {right_sample.fa} \
--threads 8 \
--cosmic_vcf {cosmic.vcf} \
--darned {darned.txt} \
--radar {radar.txt} \
--log log.txt \
--tissue_type {tissue} \
--email {your@email.com} \
--cravat_annotation_header { Trinity_CTAT/mutation/headers/cravat_annotation.txt } \
--is_hg19 \
--out_dir rnaseq_mut_out \
--plot \
--index {star_index} \
--bed {reference.bed}
Using the demo data and the included demo resource files and assuming all files are in your current directory the command would be:
on rnaseq_mutation_pipeline.py \
--reference Hg19_11.fa \
--vcf dbsnp_FLI1.vcf \
--left FLI1.left.fq \
--right FLI1.right.fq \
--threads 8 \
--cosmic_vcf CosmicCodingMuts_v72_grch37_updated_chr.vcf.gz \
--darned darned_hg19.txt \
--radar radar_hg19_v2.tx
--log log.txt \
--tissue_type Blood-Lymphocyte \
--email change@email.com \
--cravat_annotation_header cravat_annotation.txt \
--is_hg19 \
--out_dir rnaseq_mut_out \
--plot \
--index Hg19_11 \
--bed hg19.refGene.sort.bed
The following arguments are optional and needed to turn on CRAVAT associated functionality:
The following arguments are optional and needed for RNA editing filtering:
The following arguments are needed for some of the functionality involved in annotation and filtering:
The following software is required before running the RNA-Seq variant calling pipeline. Please look below for the supported versions. PLEASE use the supported versions of the tools, other versions are not supported and may not work:
At the time of writing this documentation the following versions of software are being used successfully. Other versions of the software dependencies may also be viable and are not supported.
Make sure to append to your PATH the path the STAR aligner and SciEDPiper. Also, make sure Picard-Tools and GATK jars are available.
Download the resource files for HG19 at https://data.broadinstitute.org/Trinity/CTAT/mutation/mutation_resources.tar.gz
Note that if an error occurs on a command, the pipeline should not produce files that would have resulted from that command, so no partial files should be created in the pipeline. As well, if the pipeline was stopped midway for some reason, when ran again with the same settings, the pipeline will pick-up where it left off.
There is an option to run the pipeline with a clean command line argument. This is not a default behavior. If used, when an intermediary file is made in the pipeline that is not an input file and not a final result, it is only kept until it is no longer needed for commands and then it is deleted. This keeps the footprint of the run down.
If any of these behaviors are not true, please let me know ttickle@broadinstitute.org
The pipeline recognizes the .gz extension and any commands using this will automatically have the .gz file unzipped and piped in for use making any command compatible with zipped files. Please use them space footprints.
Several jars are used in this pipeline. If you need, you may supply the path of any command. This occurs with the -u/update command .
More specific help with command line arguments can be found with the following commands:
python rnaseq_mutation_pipeline.py –help