Name: docker-galaxy-graphclust
Owner: Bioinformatics Lab - Department of Computer Science - University Freiburg
Description: A pipeline for structural clustering of RNA secondary structures
Created: 2016-12-16 12:36:58.0
Updated: 2017-03-15 14:48:06.0
Pushed: 2017-12-11 13:11:41.0
Size: 2376
Language: HTML
GitHub Committers
User | Most Recent Commit | # Commits |
---|---|---|
Björn Grüning | 2017-11-10 13:57:45.0 | 62 |
Milad Miladi | 2018-01-04 11:32:40.0 | 150 |
Eteri Sokhoyan | 2017-03-21 09:52:12.0 | 14 |
Other Committers
User | Most Recent Commit | # Commits |
---|
Galaxy-GraphClust is a web-based workflow for structural clustering of RNA secondary structures developed as an instance of GraphClust Perl pipeline inside the Galaxy framework. It consists of a set of integrated Galaxy tools and different flavors of clustering workflows built upon these tools.
This Docker image is a flavor of Galaxy Docker image customized by integrating Galaxy-GraphClust tools and workflows.
The only requirement to run this webserver locally is Docker. Docker supports the three major desktop operating systems Linux, Windows and Mac OSX. Please refer to Docker installation guideline for details.
For Windows and Mac systems it is additionally possible to use Kitematic and launch Galaxy GraphClust using the OS graphical user interface.
Alternatively Galaxy-GraphClust can be integrated into a running Galaxy server. All the Galaxy-GraphClust tools and workflows needed to run the GraphClust pipeline are listed in workflows and tools-list. The Freibug Galaxy Instance for example offers next to 700 other tools also the GraphClust Pipeline.
er run -i -t -p 8080:80 backofenlab/docker-galaxy-graphclust
For more details about this command line or specific usage, please consult the Galaxy Docker guide
.
Please check this step-by-step guide.
A running demo instance of Galaxy-GraphClust is available at http://bit.ly/GalaxyGraphClust. Please note this instance is exactly the same Docker container which we offer here. It has limited computation capacity and intended for demonstration and testing purposes. Currently it is not planned to have a long-time availability. We recommend to follow instructions above.
In case you encountered problems please use the recommended settings, check the FAQs or contact us via Issues section of the repository.
Galaxy-GraphClust has been tested One of these operating systems:
Hardware:
In case you encountered problems please check the FAQ page or contact us using Issues tab.
After running the Galaxy server, a web server is established under the host IP/URL and designated port (default 8080).
Inside your browser goto IP/URL:PORT
Following same settings as previous step
In the same local computer: http://localhost:8080/
In any computer with network connection to the host: http://HOSTIP:8080
This video tutorial can be helpful to get a visually comprehensive introduction on setting-up and running Galaxy-GraphClust.
Interactive Tours are available for Galaxy and Galaxy-GraphClust. To run the tours please on top panel go to Help?Interactive Tours and click on one of the tours prefixed GraphClust workflow. You can check the other tours for a more general introduction to the Galaxy interface.
To import or upload an existing workflow, on the top panel go to Workflow menu. On top right side of the screen click on Upload or import workflow button. You can either upload workflow from your local system or by providing the URL of the workflow. To have an access to workflow menu you must be logged in. You can download workflows from the following links
GraphClust pipeline for clustering similar RNA sequences together is a complex pipeline, for details please check GraphClust publication. Overall it consists of three major phases: a) sequence based pre-clustering b) encoding predicted RNA structures as graph features c) iterative fast candidate clustering then refinement
GraphClust pipeline overview (Heyne et al. 2012)
Below is the correspondence list of Galaxy-GraphClust tool names with each step of GraphClust:
| Stage | Galaxy Tool Name | Description|
| :——————–: | :————— | :—————-|
|1 | Preprocessing | Input preprocessing (fragmentation)|
|2 | fasta_gspan | Generation of structures via RNAshapes and conversion into graphs|
|3 | NSPDK_sparseVect | Generation of graph features via NSPDK |
|4| NSPDK_candidateClusters | min-hash based clustering of all feature vectors, output top dense candidate clusters|
|5| premLocarana,locarana_best_subtree, CMfinder | Locarna based clustering of each candidate cluster, all-vs-all pairwise alignments, create multiple alignments along guide tree, select best subtree,|
|6| Build_covariance_models | create candidate model |
|7| Search_covariance_models | Scan full input sequences with Infernal's cmsearch to find missing cluster members |
|8,9| Report results | Collect final clusters and create example alignments of top cluster members|
The input to the workflow is a set of putative RNA sequences in FASTA format. Inside the sample_data
directory you can find examples of the input format. In this case the data is from a benchmark set based on Rfam 12.0 and additionally is optionally labeled with reference Rfam family members.
Please proceed with the interactive tour named GraphClust workflow step by step
, available under Help->Interactive Tours
Check FAQs for understanding the frequently important parameters.
The output contains the predicted clusters, where similar putative input RNA sequences form a cluster. Additionally overall status of the clusters and the matching of cluster elements is reported for each cluster.
Please check the interactive tours and GraphClust README for more information about the reported info and files.
You can file an github issue or ask us on the Galaxy development list.
[M. Miladi, E. Sokhoyan, T. Houwaart, F. Costa, R. Backofen and B. Gruening, Galaxy-GraphClust: scalable and accessible clustering of ncRNAs based on secondary structures, (submitted)]
[S. Heyne, F. Costa, D. Rose, R. Backofen, GraphClust: alignment-free structural clustering of local RNA secondary structures, Bioinformatics, 2012]