bigdatagenomics/PacMin

Name: PacMin

Owner: Big Data Genomics

Description: Assembler for PacBio reads. Apache 2 licensed.

Created: 2014-08-30 00:51:37.0

Updated: 2016-09-07 21:03:28.0

Pushed: 2015-03-14 04:51:10.0

Homepage: http://www.bdgenomics.org

Size: 253

Language: Scala

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

PacMin

Assembler for PacBio reads.

Methods

We'll overlap the PacBio reads using the MinHash sketch method proposed in:

in, Konstantin, et al. "Assembling Large Genomes with Single-Molecule Sequencing and Locality Sensitive Hashing." bioRxiv (2014): 008003.

Once the reads are overlapped, we will assemble the reads into a string graph. String graphs are described in:

s, Eugene W. "The fragment assembly string graph." Bioinformatics 21.suppl 2 (2005): ii79-ii85.

We do not assume that reads are “correct”; instead, we will maintain “probabilistic” overlaps between the fragments in the string graph. Once we have obtained these probabilistic overlaps, we can estimate the ploidy of each overlap by normalizing the overlap coverage by length and can then apply traditional genotyping methods (e.g., the likelihood estimation stages used in SAMTools) to find the concensus sequences at each overlap.

Getting Started

Building PacMin

PacMin uses Maven to build. To build PacMin, cd into the repository and run “mvn package”.

Running PacMin

ADAM is packaged via appassembler and includes all necessary dependencies

You might want to add the following to your .bashrc to make running adam easier:

s pacmin=". $PACMIN_HOME/pacmin-cli/target/appassembler/bin/pacmin"

$PACMIN_HOME should be the path to where you have checked PacMin out on your local filesystem. To change any Java options (e.g., the memory settings –> “-Xmx4g”, or to pass Java properties) set the $JAVA_OPTS environment variable. Additional details about customizing the appassembler runtime can be found here.

Once this alias is in place, you can run adam by simply typing pacmin at the commandline.

Getting In Touch

License

PacMin is released under an Apache 2.0 license.

Distribution

Snapshots of PacMin are available from the Sonatype OSS repository:

upId>org.bdgenomics.pacmin</groupId>
ifactId>pacmin-core</artifactId>
sion>0.0.1-SNAPSHOT</version>

Once we've got a release, we will publish to Maven Central.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.