Name: CoreGeneBuilder
Owner: C3BI-pasteur-fr
Description: CoreGeneBuilder can be used to extract a core genome or a persistent genome from a given set of bacterial genomes
Created: 2016-06-30 12:34:08.0
Updated: 2016-10-22 08:52:29.0
Pushed: 2016-11-17 10:23:34.0
Size: 7823
Language: Shell
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
CoreGeneBuilder can be used to extract a core genome (or a persistent genome) from a given set of bacterial genomes.
Please cite CoreGeneBuilder using the following DOI:
For help please contact:
Julien Guglielmini
julien.guglielmini@pasteur.fr
Institut Pasteur
Bioinformatics and Biostatistics Hub
C3BI, USR 3756 IP CNRS
25-28 rue du docteur Roux
75015 Paris, France
Sylvain Brisse
sylvain.brisse@pasteur.fr
Institut Pasteur
Microbial Evolutionary Genomics
CNRS, UMR 3525
25-28 rue du docteur Roux
75015 Paris, France
An appliance 'CoreGeneBuilder' is available on the IFB cloud: https://www.france-bioinformatique.fr/fr/cloud.
A docker container 'CoreGeneBuilder' is also hosted on the docker registry BioShadock: https://docker-ui.genouest.org.
See INSTALL.md
file for instructions.
Use this command to display the usage:
$ coregenebuilder
Or if you run CoreGeneBuilder using docker, prefix this command by:
$ docker run -v <your_local_directory_where_are_your_data>:/root/mydisk coregenebuilder
Example:
CKER_DATA=/home/dupont/cganalysis
cker run -v $DOCKER_DATA:/root/mydisk coregenebuilder
One dataset is provided (inputs only).
It can be founded into directory data/klpn5refannot
.
More precisely:
stallation on the IFB cloud
-pr /usr/local/share/coregenebuilder/data/klpn5refannot /root/mydisk/
/root/mydisk/klpn5refannot
stallation from the docker repository
we suppose that your data will be into the following local directory:
CKER_DATA=/home/dupont/cganalysis
then download the data:
$DOCKER_DATA
et https://github.com/C3BI-pasteur-fr/CoreGeneBuilder/archive/v1.0.tar.gz
r -zxf v1.0.tar.gz && mv ./CoreGeneBuilder-1.0/data/klpn5refannot . && rm -r ./CoreGeneBuilder-1.0 ./v1.0.tar.gz
the quick start data are now here:
$DOCKER_DATA/klpn5refannot
stallation from the coregenebuilder git repository
PIPELINE=<where_coregenebuilder_distribution_is_installed_on_your_local_machine>
for example:
PIPELINE=/home/dupont/CoreGeneBuilder
$CGPIPELINE/data/klpn5refannot
We call $DIR
the analysis directory. Here DIR=klpn5refannot
.
Run this command to test your pipeline installation:
vided genbank annotation
regenebuilder -d klpn5refannot -n klpn -g MGH78578_NC.fasta -a MGH78578_NC.gb -e "NC_" -p 95 -t 1 -s 3
To run new analyses from our dataset, you must create this directory/file architecture.
We call $DIR
the new analysis directory. Here DIR=cg_analysis_ex
.
1.Move to data directory:
you run coregenebuilder on the IFB cloud :
/root/mydisk
dir cg_analysis_ex
you run the docker image:
$DOCKER_DATA
dir cg_analysis_ex
f coregenebuilder is installed on a local machine from the git repository:
$CGPIPELINE/data
dir cg_analysis_ex
2.Create input directory to store genome fasta files, they must be already stored in a directory named assemblies
dir cg_analysis_ex/assemblies
And import genomes:
<PATH_OF_GENOME_FASTA>/* cg_analysis_ex/assemblies/.
create a link to the target input directory that contains fasta files
cg_analysis_ex/assemblies
-s <PATH_OF_GENOME_FASTA> assemblies
If you provide a reference genome (option '-g'), it will be in the directory assemblies
.
Here we call it ref.fasta
:
assemblies/ref.fasta
3.If you want to provide a genbank annotation, create the directory named ref_gbk_annotation
.
Here we call it ref.gb
:
dir cg_analysis_ex/ref_gbk_annotation
ref.gb cg_analysis_ex/ref_gbk_annotation/.
4.Then launch the pipeline with these parameters for example:
regenebuilder -d cg_analysis_ex -n klpn -g ref.fasta -a ref.gb -e NC_ -p 95 -t 6
Note:
-n klpn
=> value klpn
to designate genomes of KLebsiella PNeumoniae
-e NC_
=> prefix of contig ids of files ref.fasta
and ref.gb
$DIR
when all steps of pipeline have been donemblies : contains genomes in fasta format (extensions .fasta .fas .fa .fna are only accepted)
gbk_annotation: contains a reference genbank annotation and
: contains log files of the 3 modules of the pipeline (DIVERSITY, ANNOTATION, COREGENOME)
ress_file.txt : finished steps and their status (status 'OK' if no errror)
rsity : contains input and output files of module DIVERSITY
ber_output : contains some output files of module ANNOTATION (ecamber output)
s : contains nucleic sequences of CDS
eins : contains amino-acid sequences of CDS
_genome : contains input and output files of module COREGENOME, contains core genes as nucleic and amino-acid sequences (fasta format)
doc/CoreGeneBuilder_manual.md
We thank Bertrand Néron and Amandine Perrin of Institut Pasteur for their contribution to
the deployment of CoreGeneBuilder on the IFB cloud and as a docker image on the registry BioShadock.
This work was financially supported
by the French Institute of Bioinformatics (Grant ANR-11-INBS-0013)
and by the Pasteur International Bioresources Network (PIBnet) programme.
Elise Larsonneur, Marie Touchon, Damien Mornico, Alexis Criscuolo, Sylvain Brisse, Eduardo P. C. Rocha
CoreGeneBuilder is distributed under the terms of the . For further details see COPYING file.