Name: bai-indexer
Owner: Hammer Lab
Description: Build an index for your BAM Index (BAI)
Created: 2014-10-24 17:17:36.0
Updated: 2016-10-13 23:00:50.0
Pushed: 2015-04-14 17:35:57.0
Homepage: null
Size: 212
Language: Python
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
Build an index for your BAM Index (BAI).
BAM is a common file format for storing aligned reads from a gene sequencing machine. These files can get enormous (100+ GB), so it's helpful to have an index to support fast lookup.
Samtools defines a file format for a BAM index and provides a simple command for generating one:
ools index file.bam file.bam.bai
Unfortunately, these BAM Index (BAI) files can also grow very large, often to 10 MB or more. When using a genome browser like IGV or BioDalliance, loading a large BAI file over a slow network is the unavoidable first step in displaying alignment tracks.
bai-indexer solves this problem by building an index of your BAM Index. This is a small JSON file which maps reference ID (i.e. chromosome number) to a byte range within the BAI file. By loading the BAM index, a viewer can load only the small subset of the BAM index that it actually needs.
pip install bai-indexer
bai-indexer path/to/file.bam.bai > path/to/file.bam.bai.json
The JSON index index looks like this:
hunks": [
[8, 716520],
[716520, 1463832],
[1463832, 2070072],
...
inBlockIndex": 1234
The first chunk ([8, 716520]
) specifies the byte range in the BAI file which
describes the first ref (most likely chr1
for a human genome). This is a
half-open [start, stop)
interval.
The minBlockIndex
field specifies the position of the first block in the BAM
file. Everything before this position is headers.
After setting up a virtualenv, you can get going by running:
install -r requirements.txt
tests