FredHutch/sra-pipeline

Name: sra-pipeline

Owner: Fred Hutchinson Cancer Research Center

Description: download sra files from SRA, pipe through fastq_dump and bowtie2 to S3, in a container

Created: 2018-04-05 19:04:16.0

Updated: 2018-05-03 05:22:48.0

Pushed: 2018-05-03 05:22:47.0

Homepage: null

Size: 4130

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

SRA Pipeline

This repository contains code for running an analysis pipeline in AWS Batch.

What the pipeline does

Given a set of SRA accession numbers, AWS Batch will start an array job where each child will process a single accession number, doing the following:

Prerequisites/Requirements
clone https://github.com/FredHutch/sra-pipeline.git
ra-pipeline
sra_pipeline utility

A script called sra_pipeline is available to to simplify the following:

Running the utility with --help gives usage information:

sra_pipeline --help
e: sra_pipeline.py [-h] [-c] [-i] [-s N] [-r N] [-f FILE]

onal arguments:
, --help            show this help message and exit
, --completed       show completed accession numbers
, --in-progress     show accession numbers that are in progress
 N, --submit-small N
                    submit N jobs of ascending size
 N, --submit-random N
                    submit N randomly chosen jobs
 FILE, --submit-file FILE
                    submit accession numbers contained in FILE
Additional monitoring of jobs

You can get more detail about running jobs by using
the Batch Dashboard and/or the AWS command-line client for Batch.

See Using AWS Batch at Fred Hutch for more information.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.