Name: SNPbinner
Owner: Sol Genomics Network
Description: SNPbinner is a utility for the generation of genotype binmaps based on SNP data across recombinant inbred lines.
Created: 2016-10-24 21:19:00.0
Updated: 2017-12-06 21:52:35.0
Pushed: 2018-01-02 20:11:20.0
Size: 7574
Language: Python
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
SNPbinner is a Python 2.7 package and command line utility for the generation of genotype binmaps based on SNP genotype data across populations of recombinant inbred lines (RILs). Analysis using SNPbinner is performed in three parts: crosspoints
, bins
, and visualize
.
Installation and Usage
Commands
crosspoints
bins
visualize
SNPbinner requires Python 2.7. Python 3 is currently not supported.
The only non?standard dependency of SNPbinner is Pillow, a PIL fork.
To install the SNPbinner utility, download or clone the repository and run
p install REPO-PATH
Once installed, one can execute any of the commands below like so
pbinner COMMAND [ARGS...]
Alternatively, without installing the package, one can execute any of the commands below using
thon REPO-PATH/snpbinner COMMAND [ARGS...]
| Description | Usage | Input Format | Output Format | |—|—|—|—|
crosspoints
uses genotyped SNP data to identify likely crossover points. First, the script uses a pair of hidden Markov models (HMM) to predict genotype regions along the chromosome both with (3?state) and without (2?state) heterozygous regions. Then, the script identifies groupings of regions which are too short (based on a minimum distance between crosspoints set by the user). After that it follows the rules below to find crosspoints. The script then outputs the crosspoints for each RIL and the genotyped regions between them to a CSV file.
Running the crosspoints
command requires an input path, output path, and a minimum size argument. There are also three optional arguments which can be found in the table below.
pbinner crosspoints --input PATH --output PATH (--min-length INT | --min-ratio FLOAT) [optional args]
|||Type|Description|
|:-:|:-:|:-:|:–|
|?i
|??input
|PATH
| Path to a SNP TSV, multiple paths, or a glob (e.g. myGenome.chr*.tsv).|
|?o
|??output
|PATH
| Path for the output CSV when there is a single input, or for a folder when there are multiple.|
|?m
|??min?length
|INT
| Minimum distance between crosspoints in basepairs. Cannot be used with min?ratio
.|
|?r
|??min?ratio
|FLOAT
| Minimum distance between crosspoints as a ratio. (0.01 would be 1% of the chromosome.) Cannot be used with min?length
.|
|||Type|Description|
|:-:|:-:|:-:|:–|
|?c
|??cross?count
|FLOAT
| Used to calculate transition probability. The state transition probability is this value divided by the chromosome length. (default: 4)|
|?l
|??chrom?len
|INT
| The length of the chromosome/scaffold which the SNPs are on. If no length is provided (or multiple file are being processed), the last SNP is considered to be the last site on the chromosome.|
|?p
|??homogeneity
|FLOAT
| Used to calculate emission probabilities. For example if 0.9 is used it is predicted that a region b?genotype would contain 90% b?genotype. (Default: 0.9)|
| |Input should be formatted as a tab?separated value (TSV) file with the following columns.| |—|—| |0|The SNP marker ID.| |1|The position of the marker in base pairs from the start of the chromosome.| |2+|RIL ID (header) and the called genotype of the RIL at each position.|
| |Output is formatted as a comma?separated value (CSV) file with the following columns.| |—|—| |0|The RIL ID| |Odd|Location of a crosspoint. (Empty after the chromosome ends.)| |Even|Genotype in between the surrounding crosspoints. (Empty after the chromosome ends.)|
| Description | Usage | Input Format | Output Format | |—|—|—|—|
bins
takes the crosspoints predicted for each RIL and combines similar crosspoint locations to create a combined map of all crossover points across the RILs at a specified resolution. It then projects the genotype regions of the RIL back onto the map and outputs the average genotype of each RIL in each bin on the map. The procedure is as follows. It should be noted that, to insure the changes are obvious, the illustrations below are showing a map with very low resolution (bin size) and therefore there is significant loss of information. A smaller bin size would create a more accurate map.
Running the bins command requires an input path, output path, and a minimum size argument. Optionally, a binmap ID may also be provided.
pbinner bins --input PATH --output PATH --min-bin-size INT [--binmap-id ID]
|||Type|Description|
|:-:|:-:|:-:|:–|
| ?i
| ??input
| PATH
| Path to a crosspoints CSV, multiple paths, or a glob (e.g. myGenome.chr*.crosp.csv).|
|?o
|??output
|PATH
| Path for the output CSV when there is a single input, or for a folder when there are multiple.|
|?l
|??min?bin?size
|INT
| Sets the minimum size (in bp) of each bin.|
|||Type|Description|
|:-:|:-:|:-:|:–|
|?n
|??binmap?id
|ID
| If a binmap ID is provided, a header row will be added and each column labeled with the given string.|
bins
uses the output from crosspoints
.
For details, see the crosspoints
Output Format.
| |Output is formatted as a comma?separated value (CSV) file and has the following rows.| |—|—| |0| (Optional) The binmap ID| |1| The start of each bin (in base pairs).| |2| The end of each bin (in base pairs).| |3| The center of each bin (in base pairs).| |4+| RIL ID in the first cell, then the genotypes of each bin for that RIL.|
| Description | Usage | Input Format | Output Format | |—|—|—|—|
visualize
plots the inputs and outputs of bins
and crosspoints
. It can be used to visually check the results of the above commands to help determine the best values for each of the parameters. It can accept three filetypes (SNP input TSV, crosspoint CSV, and bin CSV). It then parses the files and groups the data by RIL, creating an image for each. In each row of the resulting images, regions are colored red, green, or blue, for genotype a, heterozygous, or genotype b, respectively. The binmap is represented in gray with adjacent bins alternating dark and light. The script can accept any combination or number of files for each of the different filetypes.
pbinner visualize --out PATH [--bins PATH]... [--crosspoints PATH]... [--snps PATH]...
|||Type|Description|
|:-:|:-:|:-:|:–|
| ?o
| ??out
| PATH
| Folder to which the resulting images should be saved.|
|||Type|Description|
|:-:|:-:|:-:|:–|
|?b
|??bins
|PATH
| bins
output file to be added to the visualization.|
|?c
|??crosspoints
|PATH
| crosspoints
output file to be added to the visualization.|
|?s
|??snps
|PATH
| SNP (crosspoints
input file) file to be added to the visualization.|