hurwitzlab/ProphET

Name: ProphET

Owner: Hurwitz Lab

Description: null

Forked from: jaumlrc/ProphET

Created: 2017-10-21 16:25:36.0

Updated: 2017-10-21 16:25:38.0

Pushed: 2017-08-22 14:04:25.0

Homepage: null

Size: 25809

Language: Perl

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

ProphET, Prophage Estimation Tool: a standalone prophage sequence prediction tool with self-updating reference database.

João L. Reis-Cunha1,2, Daniella C. Bartholomeu2, Ashlee M. Earl1, Bruce W. Birren1, Gustavo C. Cerqueira1

Manuscript draft in BioRxiv


1 Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States

2 Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Brazil


Contact

jaumlrc@gmail.com

gustavo@broadinstitute.org


Required libraries and programs:

Broad users don't need to install any of the of programs and libraries listed below. If you are Broadie please follow the instructions on README_BROAD_USERS.md before installing and running ProphET.


ProphET installation:

To either install ProphET or to update ProphET bacteriophage database please execute the following command from ProphET's home directory:

INSTALL.pl

This will search for required libraries, set the paths of required programs and download from Genbank (NCBI) all genomes associated to 16 families of bacteriophages (listed in config.dir/Prophages_names_sem_Claviviridae_Guttaviridae-TxID ).

Some warnings will be issued during the setup of ProphET DB. See some examples below:

ing: bad /anticodon value '(pos:complement(13054..13056),aa:Met,seq:cat)'
ing: NC_022920: Bad value '(pos:complement(13054..13056),aa:Met,seq:cat)' for tag '/anticodon'

Those warnings refer to unexpected format for coordinates of tRNA features and they won't affect the execution.

If the script fails and reports missing Perl modules/libraries, please folow the instrucions on file README_INSTALLING_PERL_MODULES.md on how to install those.


Testing installation:

From ProphET's home directory execute the following command:

ProphET_standalone.pl --fasta test.fasta --gff_in test.gff --outdir test

The execution should take ~ 5 minutes.

Three putative prophages should be reported and its coordinates indicated in the file test/phages_coords:

AT:
ffold>  <#prophage> <genomic.start.coord> <genomic.end.coord>

ENT:
05362.1     1       327710  378140
05362.1     2       1292553 1330556

The nucleotide sequence of each prophage can be found in:

/NC_005362.1.phage_1.fas
/NC_005362.1.phage_2.fas

The program also renders a simple diagram depicting all coding genes in the bacterial genome, coding genes with significant matches to phage genes and the location of predicted prophages:

/NC_005362.1.svg

Before running ProphET in your favorite bacterial genome


Usage:

ProphET_standalone.pl --fasta_in <file> --gff_in <file> --outdir
<string> [--grid] [--gff_trna <file> ] [--help]

ons:
--fasta_in - Bacterial genome Fasta file

--gff_in - Bacterial GFF file

--gff_trna - Optional parameter, in case the tRNAs are reported in a
separate GFF please provide it here <(Optional)>

--outdir - output directory

--grid - Use UGER for BLAST jobs (Currently only works in the Broad
Institute UGER grid system) (Optional)

--help - print this and some additional info. about FASTA and GFF input
format (Optional)

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.