Name: cna-processing
Owner: raphael-group
Description: Script for taking the output of the GISTIC2.0 module and converting the data for use with MAGI, HotNet2, and CoMEt.
Created: 2016-05-24 18:40:10.0
Updated: 2016-05-26 04:31:07.0
Pushed: 2016-07-11 13:32:07.0
Homepage: null
Size: 5533
Language: Python
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
Process CNA data into formats usable by MAGI, HotNet, and CoMEt.
table_amp.conf_99.txt
)table_del.conf_99.txt
)focal_data_by_genes.txt
)focal_input.seg.txt
).-ca
argument to check automatically for the alternate format:amp_genes.conf_99.txt
)del_genes.conf_99.txt
)focal_data_by_genes.txt
)None required, beyond having necessary python libraries and gene target/location lists.
This will work assuming the GISTIC2.0 output data is using the standard naming convention.
python gistic2processing.py -g <path to gene dictionary> -d <path to data folder/tar file> -tg <path to gene target file>
A tab separated file, each row is a sample and associated gene mutations. The first column is the sample name, each entry after is a gene with either (A) for amplified or (D) for deleted appended on the end. The name is {prefix}_hotnet2.tsv. The prefix is a optional argument. If no prefix is supplied, the default is 'output'.
The same as HotNet2, named {prefix}_all_cna_comet.tsv. There is an additional file named {prefix}_name_map_comet.tsv, which provides a mapping from a shortened, more human-readable concatenated gene list to the full list of genes found in each peak. That file is a series of rows with the shortened name in the first column and the longer name in the second, tab separated.
A five column, tab separated file with a header. Each row (in order) consists of a gene, a sample, the CNA type (Del/Amp), left position, and right position.
All options can be set by command line or via a configuration file. The command line will override configuration file settings, and the names and usage for command line and configuration file options are the same.
The following options must be specified by the user either at the command line or in the config file, and cannot be left to defaults:
Config/long argument | Short argument | Input type | Description :————————————-| :—– |:—– |:—– gene_dictionary | -g | Path to file |Path to the gene dictionary, a json file with a key/value pair of { gene : [[start location, end location, chromosome number]] }. See example folder in this repository. data | -d | Path to folder or tar file| Path to either a folder with a flat structure containing all of the GISTIC2.0 data, or the path to a tar file (does not have to have a flat internal file structure) with the GISTIC2.0 data. target_genes | -tg | Path to file |Path to a file containing the target genes. File format is two columns, tab separated, with the gene name in the first column and Amp/Del/Both (not case sensitive) in the second column.