hackseq/Indicator_contig_predictor

Name: Indicator_contig_predictor

Owner: hackseq

Description: A two-way classifier to characterize metagenomes based on short and long read technologies

Created: 2016-08-31 22:58:49.0

Updated: 2017-08-04 12:20:49.0

Pushed: 2016-10-17 23:32:12.0

Homepage:

Size: 56

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

De novo metagenomic marker pipeline

Pipeline
  1. Human Infant Microbiome Dataset (“Babybiome”)
  2. Kudos to Molly K. Gibson for excellent datasharing (Gibson MK, Wang B, Ahmadi S, et al. Developmental dynamics of the preterm infant gut microbiota and antibiotic resistome. Nature microbiology. 2016;1:16024.)
  3. /src/query.py
  4. Based on magicBLAST, a new RNAseq BLAST mapper
  5. /src/coverager.py & /scripts/test_coverager.sh
  6. Generation of BAMs with magicBLAST mapping to long reads (direct streaming from SRA)
  7. Building a histogram of read coverage
  8. Thresholding for uniform deep and broad coverage of long reads with short reads (indicator contigs)
  9. Using chi-squared test to check for uniformity
  10. Generating probability of long read in short read set
  11. Gen. Classifier
  12. Separation by physiological features
  13. Male-Female
  14. Delivery mode
  15. Probability of gene co-occurrence?

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.