CD2H gitForager

hammerlab/bai-indexer

Name: bai-indexer

Owner: Hammer Lab

Description: Build an index for your BAM Index (BAI)

Created: 2014-10-24 17:17:36.0

Updated: 2016-10-13 23:00:50.0

Pushed: 2015-04-14 17:35:57.0

Homepage: null

Size: 212

Language: Python

GitHub Committers

User	Most Recent Commit	# Commits

Other Committers

User	Email	Most Recent Commit	# Commits

README

bai-indexer

Build an index for your BAM Index (BAI).

Background

BAM is a common file format for storing aligned reads from a gene sequencing machine. These files can get enormous (100+ GB), so it's helpful to have an index to support fast lookup.

Samtools defines a file format for a BAM index and provides a simple command for generating one:

ools index file.bam file.bam.bai

Unfortunately, these BAM Index (BAI) files can also grow very large, often to 10 MB or more. When using a genome browser like IGV or BioDalliance, loading a large BAI file over a slow network is the unavoidable first step in displaying alignment tracks.

bai-indexer solves this problem by building an index of your BAM Index. This is a small JSON file which maps reference ID (i.e. chromosome number) to a byte range within the BAI file. By loading the BAM index, a viewer can load only the small subset of the BAM index that it actually needs.

Usage

pip install bai-indexer

bai-indexer path/to/file.bam.bai > path/to/file.bam.bai.json

Format

The JSON index index looks like this:


hunks": [
[8, 716520],
[716520, 1463832],
[1463832, 2070072],
...

inBlockIndex": 1234

The first chunk ([8, 716520]) specifies the byte range in the BAI file which describes the first ref (most likely chr1 for a human genome). This is a half-open [start, stop) interval.

The minBlockIndex field specifies the position of the first block in the BAM file. Everything before this position is headers.

Development

After setting up a virtualenv, you can get going by running:

install -r requirements.txt
tests

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.