BackofenLab/docker-galaxy-graphclust

Name: docker-galaxy-graphclust

Owner: Bioinformatics Lab - Department of Computer Science - University Freiburg

Description: A pipeline for structural clustering of RNA secondary structures

Created: 2016-12-16 12:36:58.0

Updated: 2017-03-15 14:48:06.0

Pushed: 2017-12-11 13:11:41.0

Homepage:

Size: 2376

Language: HTML

GitHub Committers

UserMost Recent Commit# Commits
Björn Grüning2017-11-10 13:57:45.062
Milad Miladi2018-01-04 11:32:40.0150
Eteri Sokhoyan2017-03-21 09:52:12.014

Other Committers

UserEmailMost Recent Commit# Commits

README

DOI Build Status Docker Repository on Quay

Galaxy-GraphClust

Galaxy-GraphClust is a web-based workflow for structural clustering of RNA secondary structures developed as an instance of GraphClust Perl pipeline inside the Galaxy framework. It consists of a set of integrated Galaxy tools and different flavors of clustering workflows built upon these tools.

:whale: Galaxy-GraphClust Docker Image

This Docker image is a flavor of Galaxy Docker image customized by integrating Galaxy-GraphClust tools and workflows.

Table of Contents

Installation and Setup:

Requirements:

The only requirement to run this webserver locally is Docker. Docker supports the three major desktop operating systems Linux, Windows and Mac OSX. Please refer to Docker installation guideline for details.

Running the Galaxy server
From the command line (Linux/Windows/MacOS):
er run -i -t -p 8080:80 backofenlab/docker-galaxy-graphclust

For more details about this command line or specific usage, please consult the Galaxy Docker guide.

Using graphic interface (Windows/MacOS):

Please check this step-by-step guide.

Demo instance:

A running demo instance of Galaxy-GraphClust is available at http://bit.ly/GalaxyGraphClust. Please note this instance is exactly the same Docker container which we offer here. It has limited computation capacity and intended for demonstration and testing purposes. Currently it is not planned to have a long-time availability. We recommend to follow instructions above.

Setup support:

In case you encountered problems please use the recommended settings, check the FAQs or contact us via Issues section of the repository.

Recommended settings:

Galaxy-GraphClust has been tested One of these operating systems:

Hardware:

Setup support:

In case you encountered problems please check the FAQ page or contact us using Issues tab.

Usage - How to run Galaxy-GraphClust:

Browser access to the server:

After running the Galaxy server, a web server is established under the host IP/URL and designated port (default 8080).

Help
Video tutorial

This video tutorial can be helpful to get a visually comprehensive introduction on setting-up and running Galaxy-GraphClust.

IMAGE ALT TEXT HERE

Interactive tours

Interactive Tours are available for Galaxy and Galaxy-GraphClust. To run the tours please on top panel go to Help?Interactive Tours and click on one of the tours prefixed GraphClust workflow. You can check the other tours for a more general introduction to the Galaxy interface.

Import or upload a workflow

To import or upload an existing workflow, on the top panel go to Workflow menu. On top right side of the screen click on Upload or import workflow button. You can either upload workflow from your local system or by providing the URL of the workflow. To have an access to workflow menu you must be logged in. You can download workflows from the following links

Frequently Asked Questions

GraphClust pipeline overview

GraphClust pipeline for clustering similar RNA sequences together is a complex pipeline, for details please check GraphClust publication. Overall it consists of three major phases: a) sequence based pre-clustering b) encoding predicted RNA structures as graph features c) iterative fast candidate clustering then refinement

GraphClust pipeline overview (Heyne et al. 2012)

GraphClust pipeline overview (Heyne et al. 2012)

Below is the correspondence list of Galaxy-GraphClust tool names with each step of GraphClust:

| Stage | Galaxy Tool Name | Description|
| :——————–: | :————— | :—————-| |1 | Preprocessing | Input preprocessing (fragmentation)|
|2 | fasta_gspan | Generation of structures via RNAshapes and conversion into graphs| |3 | NSPDK_sparseVect | Generation of graph features via NSPDK | |4| NSPDK_candidateClusters | min-hash based clustering of all feature vectors, output top dense candidate clusters| |5| premLocarana,locarana_best_subtree, CMfinder | Locarna based clustering of each candidate cluster, all-vs-all pairwise alignments, create multiple alignments along guide tree, select best subtree,| |6| Build_covariance_models | create candidate model | |7| Search_covariance_models | Scan full input sequences with Infernal's cmsearch to find missing cluster members | |8,9| Report results | Collect final clusters and create example alignments of top cluster members|

Input:

The input to the workflow is a set of putative RNA sequences in FASTA format. Inside the sample_data directory you can find examples of the input format. In this case the data is from a benchmark set based on Rfam 12.0 and additionally is optionally labeled with reference Rfam family members.

Configuring the workflows:

Please proceed with the interactive tour named GraphClust workflow step by step, available under Help->Interactive Tours Check FAQs for understanding the frequently important parameters.

Output:

The output contains the predicted clusters, where similar putative input RNA sequences form a cluster. Additionally overall status of the clusters and the matching of cluster elements is reported for each cluster.

Please check the interactive tours and GraphClust README for more information about the reported info and files.

Contributors

Support & Bug Reports

You can file an github issue or ask us on the Galaxy development list.

Publications

[M. Miladi, E. Sokhoyan, T. Houwaart, F. Costa, R. Backofen and B. Gruening, Galaxy-GraphClust: scalable and accessible clustering of ncRNAs based on secondary structures, (submitted)]

[S. Heyne, F. Costa, D. Rose, R. Backofen, GraphClust: alignment-free structural clustering of local RNA secondary structures, Bioinformatics, 2012]


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.