wtsi-hgi/lofi

Name: lofi

Owner: Wellcome Trust Sanger Institute - Human Genetics Informatics

Description: Sequence alignment data downsampling tool

Created: 2018-03-02 10:55:01.0

Updated: 2018-03-05 16:03:02.0

Pushed: 2018-03-05 16:02:48.0

Homepage:

Size: 23

Language: Shell

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Lo-Fi

Downsample sequence alignment data (SAM, BAM or CRAM files) given specific constraints.

Requirements

Usage

lofi [--coverage COVERAGE] [--species SPECIES]
     [--read-groups RG_RANGE] [--rg-coverage RG_COVERAGE] [--keep-rg RG]
     [--strategy STRATEGY] [--tolerance TOLERANCE] [--seed SEED]
     [--prefix PREFIX] [--no-recombine]
     INPUT

At least one constraint must be specified.

Coverage Constraints
--coverage COVERAGE        The overall coverage of the output
--rg-coverage RG_COVERAGE  The minimum coverage in output read groups

COVERAGE and RG_COVERAGE are specified as the number of base pairs, optionally suffixed with k, M or G. If the --species option is used, an x suffix may also be used, indicating a coverage relative to the species genome size (under the presumption that your input is whole genome sequence data).

Supported SPECIES values:

Read Group Constraints
--read-groups RG_RANGE     Read groups in the output
--keep-rg RG               Do not drop the read group with the given ID;
                           this can be specified multiple times

The RG_RANGE takes the format MIN-, -MAX, MIN-MAX or EXACT.

Downsampling Options
--strategy STRATEGY        Downsampling strategy [default: DROP_RG]
--tolerance TOLERANCE      Allowed percentage difference in output
                           coverage from target [default: 10]
--seed SEED                Random seed [default: 0]

Supported STRATEGY values:

Output Options
--prefix PREFIX            Output path prefix
--no-recombine             Don't recombine individual read groups

The output file(s) will be named [PREFIX]INPUT[.READ_GROUP].downsampled, where READ_GROUP will be the read group ID if --no-recombine is specified. The PREFIX, if specified, is relative to the INPUT's location, or an absolute path.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.