cbcrg/bengen

Name: bengen

Owner: Notredame Lab

Description: Docker based Multiple sequence aligners benchmark prototype

Created: 2015-03-25 15:48:49.0

Updated: 2017-10-25 10:12:28.0

Pushed: 2017-11-03 19:18:53.0

Homepage: null

Size: 168981

Language: Web Ontology Language

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

BENGEN

Introduction

BenGen is a fully reproducible, automatic and scalable benchmarking prototype, which provides consistently annotated and community-sharable results.

BenGen is functional for the benchmarking of multiple sequence aligners, yet can be easily adapted for the benchmarking of other bioinformatics methods.

How does it work?

Nextflow is the skeleton of Bengen and defines the Benchmarking workflow.

Aligner tools are stored as Docker images and available through the Docker Hub. A unique ID is assigned to each image. This guarantees the containers immutability and the full replicability of the benchmark over time.

Docker provides a container runtime for local and cloud environments. Singularity performs the same role in the context of HPC clusters.

An RDF database, based on the EDAM ontology vocabulary, contains metadata information about each component of the benchmark, making possible to automatize the benchmark and provide a consistent and machine-readable description of the incorporated data, algorithms and their results.

GitHub stores and tracks code changes in consistent manner. It also provides a friendly and well-known user interface that would enable third parties to contribute their own tools with ease.

alt tag

GETTING STARTED
Dependencies

In order to run bengen on your machine Docker and Nextflow need to be installed.

Setup

You first need to clone the Bengen repository:

clone https://github.com/cbcrg/bengen

Then move in the bengen directory and use make to create all the needed images:

engen && make

Now you are ready to use Bengen!

RUNNING BENGEN LOCALLY (automatic modus)

In order to run BenGen on your machine in its automatic mode, after having followed the steps under the Getting started section, you can trigger the computation locally using the following command.

flow run query.nf

Tip: You can use the -resume command to cache what was already computed. This could happen if you run BenGen multiple times.

flow run query.nf -resume

In this way, the Metadata dataset is queried and the datasets, methods and scoring functions are automatically selected and run. The selection depends on the query.rq sparql file: this selects only the eligible combinations which can be run. Eventually the results are stored in the scores.ttl file in the proper RDF format.

RUNNING BENGEN LOCALLY (manually)

In order to run BenGen manually, and so define the datasets, scoring functions and methods to be run, the bengen.nf script must be used.

flow run bengen.nf

Tip: You can use the -resume command to cache what was already computed. This could happen if you run BenGen multiple times.

flow run bengen.nf -resume

If you wish to test BenGen on a restricted amount of data in order to speed things up and quickly getting an overview on how it works you can use the following command:

flow run bengen.nf --scores DEMO/scores_demo.txt --methods DEMO/methods_demo.txt --dataset_folder "benchmarking_datasets_demo"

The overall benchmark is driven by a configuration file that allows the definition of different components

Example of configuration file content:

er.enabled = true

ms.dataset = "balibase-v3.01"
ms.renderer = "csv"
ms.out = "output.${params.renderer}"

Important Inside of the bengen directory you can find the methods.txt file and the scores.txt file. They define which aligner to use and which score function to use. You can modify them by adding/removing lines with the name of the aligners/scores you want to run (eg. bengen/NameOfAlignerOrScore).

Example of methods.txt:

en/mafft
en/tcoffee
en/clustalo

Example of scores.txt:

en/qscore
en/baliscore

! You can see which aligners/scores are already integrated in the project by looking respectively in the boxes or boxes_score directories. You can find these in the bengen directory.

MODIFY BENGEN
Add a Multiple Sequence Aligner

You can easily integrate your new MSA in Bengen by using a script that automatically does the work for you.

In the bengen directory that you cloned you can find the add.sh script.

ARGUMENTS:


Example:

bash add.sh --n MSA-NAME -m /complete/path/to/your/metadatafile -t /complete/path/to/your/templatefile 

You can find more inforamation on how to properly create the metadata and template files under the documentation

CONTRIBUTE TO THE PROJECT

If you wish to contribute to the project you can integrate your new MSA in the public project.

You need to follow these steps :

  1. Clone the repository and modify it by adding your new MSa
  2. Do a pull request to merge the project
  3. Upload the docker images on dockerhub

Afterwards the maintainer of the project will recieve a notification and accept it if relevant to the project. Then the maintainer triggers the computation and the new results are shown on a public HTML page.

alt tag


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.