AAFC-BICoE/snakemake-amplicon-metagenomics

Name: snakemake-amplicon-metagenomics

Owner: Biological Informatics CoE @ Agriculture and Agri-Food Canada

Owner: Biological Informatics CoE @ Agriculture and Agri-Food Canada

Description: Command line bioinformatics workflows, created with Snakemake workflow management tool.

Created: 2016-06-17 17:41:43.0

Updated: 2017-12-05 18:07:11.0

Pushed: 2017-07-05 19:20:18.0

Homepage: null

Size: 36

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Amplicon Metagenomic Workflow

Synopsis

This workflow describes a series of steps executed to get from raw fastq files, resulting from the amplicon sequencing of sample(s), to OTU table, describing the taxonomic determination summary for the analysed sample(s). It executes on Linux command line, using a Snakemake workflow management system.

Workflow
  1. QC of input files
  2. Trim input files
  3. QC of trimmed files
  4. Join forward and referse sequences
  5. QC of joined sequences
  6. Cluster sequences
  7. Pick representative sequences
  8. Detect and remove chimeric representative sequences/clusters
  9. Taxonomic classification
  10. Create OTU table
Setup
  1. Create a directory with input data. This should be paired illumina sequences.
    • This can be a directory of symbolic links to data elsewhere on a file system.
  2. In config.yaml file:
    • Verify the working directory, this is the location of the pipeline output
    • Verify input directory, this can be an absolute path or relative to the working directory
    • Verify reference fasta and taxonomy, this can be absolute paths or relative to the working directory
    • Verify that input sequences file extension
    • Verify the input_file_forward_postfix parameter corresponds to the naming of your raw files. Change it, if necessary.
  3. Optional: change tools parameters/paths in config.yaml file.

For details on the workflow tools, their version, arguments used, and order of execution see Snakefile.

Execute

To check if the workflow will run correctly without executing the steps:

$ snakemake -np --configfile config.yaml

To execute the workflow:

$ snakemake --configfile config.yaml

Note: If you are not in the same directory as the Snakefile you will need the extra parameter --snakefile with the path to the Snakefile

Installation

This worflow runs on Linux. To install this workflow, either locally or on a cluster, you will need to have the following requirements installed.

Requirements

Download the latest release of this project:

https://github.com/AAFC-MBB/snakemake-amplicon-metagenomics/releases

OR

Check out this project (requires git):

$ git clone https://github.com/AAFC-MBB/snakemake-workflows.git
Tests

Automated test is located in snakemake-workflows/amplicon_workflow/test/. To run the test, first download the test data to snakemake-workflows/amplicon_workflow/test/data/ directory (see README in that directory for the instructions). Before you execute the test please note, that the test runs Snakefile with the test data and therefore uses the same output directory as a regular snakemake command (default is snakemake-workflows/amplicon_workflow/data). Therefore if you already have some input or intermetiate workflow execution data in your data directory and you would like to keep it - back it up.

Execute the tests:

$ ./test.sh -clean -run
Info

For more information about Snakemake, visit their website: https://bitbucket.org/snakemake/snakemake/wiki/Home

Authors

Oksana Korol

Christine Lowe

Licensing

See License file.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.