CD2H gitForager

raphael-group/NAIBR

Name: NAIBR

Owner: raphael-group

Description: Novel Adjacency Identification with Barcoded Reads

Created: 2017-05-09 01:01:51.0

Updated: 2018-01-02 02:59:52.0

Pushed: 2017-12-07 16:58:21.0

Homepage: null

Size: 8093

Language: Python

GitHub Committers

User	Most Recent Commit	# Commits

Other Committers

User	Email	Most Recent Commit	# Commits

README

Overview

NAIBR (Novel Adjacency Identification with Barcoded Reads) identifies novel adjacencies created by structural variation events such as deletions, duplications, inversions, and complex rearrangements using linked-read whole-genome sequencing data produced by 10X Genomics. Please refer to the publication for details about the method.

NAIBR takes as in put a BAM file produced by 10X Genomic's Long Ranger pipeline and outputs a BEDPE file containing predicted novel adjacencies and a likelihood score for each adjacency.

Installing NAIBR

clone https://github.com/raphael-group/NAIBR.git

NAIBR is written in python 2.7 and requires the following dependencies: pysam, numpy, scipy, subprocess, and matplotlib

Running NAIBR

NAIBR can be run using the following command:

on NAIBR.py <configfile>

A template config file can be found in example/example.config. The following parameters can be set in the config file:

bam_file: Input BAM file < required >
min_mapq: Minimum mapping quality for a read to be included in analysis (default: 40)
outdir: Output directory (default: . )
d: The maximum distance between reads in a linked-read
blacklist: tap separated list of regions to be excluded from analysis (default: None)
candidates: List in BEDPE format of novel adjacencies to be scored by NAIBR. This will override automatic detection of candidate novel adjacencies.
threads: Number of threads (default: 1)
min_sv: Minimum size of a structural variant to be detected (default: lmax, the 95th percentile of the paired-end read insert size distribution)
k: minimum number of barcode overlaps supporting a candidate NA (default = 3)

Output

NAIBR outputs a BEDPE file containing all novel scored novel adjacencies. Predicted novel adjacencies with scores greater than the threshold c are labelled 'PASS' and others are labelled 'FAIL'.

Example

Example files are provided in the 'example' directory. Running

on NAIBR.py example/example.config

will produce the file 'example/NAIBR_SVs.bedpe'.

Citing NAIBR

Elyanow, Rebecca, Hsin-Ta Wu, and Benjamin J. Raphael. “Identifying structural variants using linked-read sequencing data.” Bioinformatics (2017).

icle{elyanow2017identifying,
tle={Identifying structural variants using linked-read sequencing data},
thor={Elyanow, Rebecca and Wu, Hsin-Ta and Raphael, Benjamin J},
urnal={Bioinformatics},
ar={2017}

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.