Name: hicmaptools
Owner: Notredame Lab
Description: null
Created: 2015-04-16 08:54:51.0
Updated: 2017-01-18 14:13:54.0
Pushed: 2017-12-23 12:29:55.0
Homepage: null
Size: 3375
Language: C++
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
hicmaptools is a collection of tools for downstream HiC contmap analysis.
hicmaptools compilation requires the following tools installed on your system make
, gcc-c++
and R
.
Clone the git repository on your computer with the following command:
git clone git@github.com:cbcrg/hicmaptools.git hicmaptools
Make sure you have installed the required dependencies listed above.
When done, move in the project root folder named hicmaptools
and enter the
following commands:
$ cd src
$ make
The binary will be automatically copied to the path hicmaptools/bin
.
$ make install
The binary will be automatically copied to the path specified by the environment
variable $USER_BIN
(check that it exists before run the make command).
hicmaptools -in_map in.binmap -in_bin in.bins SELECT_ONE_QUERY_MODE query.bed -output out_file.tsv
options:
-in_map text .n_contact or binary .bimap by genBiMap commend
-in_bin the bin file for contact map, .bins
query modes:
-bat a loci bat: chr start end
-output ave neighboring contact of the bat
-couple pair of sites: chr1 start1 end1 chr2 start2 end2
-output contacts between all pairs
-local a interval: chr start end
-output all contacts inside interval
-loop loci gene: chr start end
-output contact between two ends, ie. 5' and 3' genes
-TAD loci interval: chr start end
-output sum/ave contact of the TAD
-sites interesting sites: chr start end
-output contact between those sites
-submap genome region to extract: chr start end
-output sub contact map, ie. 3R:10~15MB
other parameters:
-ner_bin check neighbouring bins for bat mode, d.f=10
-random assign random size, d,f=500
For instance:
hicmaptools -in_map nm_none_1000_reduced.bimap -in_bin nm_none_1000.bins -query_interval data/10000_40000_top5.epi_domains -output 10000_40000_top5-contact.tsv
define the chromosome, start position and end position of each bin. Format is as the following:
cbin chr from.coord to.coord
1 2L 6000 7000
2 2L 7000 8000
3 2L 8000 9000
4 2L 9000 10000
5 2L 12000 13000
contact map indexed by bins. Format is as the following:
expected_count : the expected contact between those two chromosome regions (bins) according to model
observed_count : the observed contact between those two chromosome regions (bins) by HiC data
1 cbin2 expected_count observed_count
1 0.077080 50
2 0.389912 314
3 0.493750 163
4 0.560505 169
5 0.368884 79
bed format : first three required columns are enough.
There will generate two output files after excuting hicmaptools commands :
Illustration for different query options
Suppose you have such files below:
And you want to use the query such as -bat
use the command :
hicmaptools -in_map nm_none_30000.n_contact -in_bin 30000.cbins -bat BATtest.txt -output temp.txt
temp : output name you assign
You will get two output files :
When you open the temp.txt, you may see:
x chrom start end ... rank_obs rank_exp rank_nor
3R 100000 200000 ... 0.880 0.990 0.760
You may concern whether the rank information are conviced, so you can use the tool we support to examine it.
If the random data are normal distribution, we can assume the rank info are convinced.
Therefore, our tool are supported to examine normal distribution, following the command:
Rscript tools/normality_test.R temp_random.txt outputname
You will get the exam ouput message and a PDF file contains three plot.
Illustration for PDF file