C3BI-pasteur-fr/CoreGeneBuilder

Name: CoreGeneBuilder

Owner: C3BI-pasteur-fr

Description: CoreGeneBuilder can be used to extract a core genome or a persistent genome from a given set of bacterial genomes

Created: 2016-06-30 12:34:08.0

Updated: 2016-10-22 08:52:29.0

Pushed: 2016-11-17 10:23:34.0

Homepage:

Size: 7823

Language: Shell

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

CoreGeneBuilder

CoreGeneBuilder can be used to extract a core genome (or a persistent genome) from a given set of bacterial genomes.

CITATION

Please cite CoreGeneBuilder using the following DOI: DOI

CONTACT

For help please contact:

Julien Guglielmini
julien.guglielmini@pasteur.fr
Institut Pasteur
Bioinformatics and Biostatistics Hub
C3BI, USR 3756 IP CNRS
25-28 rue du docteur Roux
75015 Paris, France

Sylvain Brisse
sylvain.brisse@pasteur.fr
Institut Pasteur
Microbial Evolutionary Genomics
CNRS, UMR 3525
25-28 rue du docteur Roux
75015 Paris, France

INSTALLATION

An appliance 'CoreGeneBuilder' is available on the IFB cloud: https://www.france-bioinformatique.fr/fr/cloud.
A docker container 'CoreGeneBuilder' is also hosted on the docker registry BioShadock: https://docker-ui.genouest.org.
See INSTALL.md file for instructions.

USAGE

Use this command to display the usage:
$ coregenebuilder

Or if you run CoreGeneBuilder using docker, prefix this command by:
$ docker run -v <your_local_directory_where_are_your_data>:/root/mydisk coregenebuilder

Example:

CKER_DATA=/home/dupont/cganalysis
cker run -v $DOCKER_DATA:/root/mydisk coregenebuilder
QUICK START
Run CoreGeneBuilder on EXAMPLE data

One dataset is provided (inputs only). It can be founded into directory data/klpn5refannot.
More precisely:

stallation on the IFB cloud
 -pr /usr/local/share/coregenebuilder/data/klpn5refannot /root/mydisk/
 /root/mydisk/klpn5refannot

stallation from the docker repository
we suppose that your data will be into the following local directory:
CKER_DATA=/home/dupont/cganalysis
then download the data:
 $DOCKER_DATA
et https://github.com/C3BI-pasteur-fr/CoreGeneBuilder/archive/v1.0.tar.gz
r -zxf v1.0.tar.gz && mv ./CoreGeneBuilder-1.0/data/klpn5refannot . && rm -r ./CoreGeneBuilder-1.0 ./v1.0.tar.gz
the quick start data are now here:
 $DOCKER_DATA/klpn5refannot

stallation from the coregenebuilder git repository
PIPELINE=<where_coregenebuilder_distribution_is_installed_on_your_local_machine>
for example:
PIPELINE=/home/dupont/CoreGeneBuilder
 $CGPIPELINE/data/klpn5refannot

We call $DIR the analysis directory. Here DIR=klpn5refannot.
Run this command to test your pipeline installation:

vided genbank annotation
regenebuilder -d klpn5refannot -n klpn -g MGH78578_NC.fasta -a MGH78578_NC.gb -e "NC_" -p 95 -t 1 -s 3  
Run CoreGeneBuilder on YOUR data

To run new analyses from our dataset, you must create this directory/file architecture.
We call $DIR the new analysis directory. Here DIR=cg_analysis_ex.

1.Move to data directory:

 you run coregenebuilder on the IFB cloud :
 /root/mydisk
dir cg_analysis_ex
 you run the docker image:
 $DOCKER_DATA
dir cg_analysis_ex
f coregenebuilder is installed on a local machine from the git repository:
 $CGPIPELINE/data
dir cg_analysis_ex

2.Create input directory to store genome fasta files, they must be already stored in a directory named assemblies

dir cg_analysis_ex/assemblies

And import genomes:

 <PATH_OF_GENOME_FASTA>/* cg_analysis_ex/assemblies/.

 create a link to the target input directory that contains fasta files
 cg_analysis_ex/assemblies
 -s <PATH_OF_GENOME_FASTA> assemblies

If you provide a reference genome (option '-g'), it will be in the directory assemblies.
Here we call it ref.fasta:

 assemblies/ref.fasta

3.If you want to provide a genbank annotation, create the directory named ref_gbk_annotation. Here we call it ref.gb:

dir cg_analysis_ex/ref_gbk_annotation
 ref.gb cg_analysis_ex/ref_gbk_annotation/.

4.Then launch the pipeline with these parameters for example:

regenebuilder -d cg_analysis_ex -n klpn -g ref.fasta -a ref.gb -e NC_ -p 95 -t 6

Note:
-n klpn => value klpn to designate genomes of KLebsiella PNeumoniae
-e NC_ => prefix of contig ids of files ref.fasta and ref.gb

Architecture of $DIR when all steps of pipeline have been done
mblies        : contains genomes in fasta format (extensions .fasta .fas .fa .fna are only accepted)
gbk_annotation: contains a reference genbank annotation and
              : contains log files of the 3 modules of the pipeline (DIVERSITY, ANNOTATION, COREGENOME)
ress_file.txt : finished steps and their status (status 'OK' if no errror)
rsity         : contains input and output files of module DIVERSITY
ber_output    : contains some output files of module ANNOTATION (ecamber output)
s             : contains nucleic sequences of CDS
eins          : contains amino-acid sequences of CDS
_genome       : contains input and output files of module COREGENOME, contains core genes as nucleic and amino-acid sequences (fasta format)
For more information, please refer to the manual of CoreGeneBuilder :

doc/CoreGeneBuilder_manual.md

ACKNOWLEDGMENTS

We thank Bertrand Néron and Amandine Perrin of Institut Pasteur for their contribution to the deployment of CoreGeneBuilder on the IFB cloud and as a docker image on the registry BioShadock.
This work was financially supported by the French Institute of Bioinformatics (Grant ANR-11-INBS-0013) and by the Pasteur International Bioresources Network (PIBnet) programme.

AUTHORS

Elise Larsonneur, Marie Touchon, Damien Mornico, Alexis Criscuolo, Sylvain Brisse, Eduardo P. C. Rocha

LICENSING

CoreGeneBuilder is distributed under the terms of the License: GPL v3. For further details see COPYING file.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.