Molmed/hercules

Name: hercules

Owner: Uppsala University, Department of Medical Sciences, Molecular Medicine

Description: null

Created: 2014-09-02 10:17:54.0

Updated: 2015-05-04 21:21:25.0

Pushed: 2015-05-28 12:05:21.0

Homepage: null

Size: 1987

Language: Scala

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Hercules

NOTE: This this project is under heavy development and anything can change without previous notice (and a lot of the functionality is not in place yet).

Quick start: This was way too much text! How do I install Hercules (on a Redhat based system)?

# Make sure you have Java 7 and sbt installed
git clone https://github.com/Molmed/hercules.git &&\
cd hercules &&\
sbt rpm:packageBin &&\
sudo yum install target/rpm/RPMS/noarch/hercules-*.noarch.rpm
# Read the section on configuration to get a proper setup of your roles!

Or try it out on the Vagrant machines defined in this repo (you'll need to have Vagrant and VirtualBox installed):

vagrant up

At the moment you'll have to installed Hercules your self on the nodes - but we are working on setting it all up with Ansible. Check out our public ansible role here.

Background:

The amazing Hercules project, taking the stone out of Sisyphus hands! Hercules is a project at the SNP&SEQ Technology Platform which aims at automating the data flow at our sequencing facilty. Our workflow looks as follows:

We have previously run these steps semi-manually with the help of Sisyphus. The first goal of Hercules is to provide a way to take away the hands-on parts of the Sisyphus workflow. Thus Sisyphus is used as a backend for a lot of the tasks carried our by Hercules. Hercules, is however, not restricted to using Sisyphus but should be able to support other backends as necessary.

Hercules is built as a actor-based distributed system using the Akka framework. We use the Akka cluster functionality to be able to start Hercules nodes on different machines and assign them with one or more roles. Futhermore Hercules exposes a REST API which can be used to control it either through a commandline client or a web based front-end (which are both upcoming).

Key concepts and features

To understand how Hercules works it's important to grasp a few key concepts.

The cluster: The Hercules system consists of one or more nodes (running Hercules instances). These constitute a cluster. Nodes: Each running Hercules instances is called a node. Roles: Each node can run one or more roles. Examples of roles are: Master and demultiplexer. Each roles contains one part of the workflow functionallity. The roles communicate with each other through sending messages, and this is independent of whether the roles are running on the same node or not. Master: The actor which acts as the centralizing point in the workflow. Messages from other roles will pass through the master, which will decide what to do with them.

Defining nodes and roles makes Hercules very flexible in how it is able to run. It's possible to run Hercules with all required roles on a single node, or to split the roles out over serveral nodes. The later is useful as it allows you run some long runing and resoruce intensive operations (such as the demultiplexing) on separate nodes.

It also makes Hercules highly resilient to errors. An error in one part of the system should not propagate and disturb other parts of the system. Nodes can be taken down and up again, and they will then re-attach themselves to the cluster. Futhermore the mater actor has a persistance mechanism which makes sure that data will not be lost.

Configuration

As previously noted we maintain a Ansible role which can be used to configure Hercules. If you are an Ansible user we strongly encurage you to use that to configure Hercules. If not, you might want to get into the details below.

The Hercules is configured using Typesafe Config - in HOCON (a human friendly subset of json). By default the Hercules config will be found under: /etc/hercules/application.conf. After you have made any changes to the config you need to restart the service: service hercules restart. Most of the config should have reasonable defaults, but one warant a comment:

Setting up the roles

hercules {
   roles = ["master", "restapi"]
}

By default the “master” and “restapi” roles will be set. The other available roles at the moment are demultiplexer and watcher. Run hercules --help for a complete and updated list for the versoin that you are running on.

The rest of the configuration options should be fairly self-explanatory, set them to match your own setup.

Prepare for development

You need to have nfs installed on your machine for the syncing to work properly. On ubuntu install it with:

sudo apt-get install nfs-common nfs-kernel-server

This project is accompanied by a vagrant file. Once you have Vagrant and VirtualBox set up this means that you should be able to spool up a virtual test system with this command (run from the root of the project):

vagrant up

To see the status of the virtual machines try:

vagrant status

And to ssh into them:

vagrant ssh <machine name>

You can also download a minimal test data set (provided that you have access to the correct ssh config) by running the download_test_data.sh script.

Install instructions

Checkout the source and then package Hercules as a rpm (note that you need to have sbt installed):

sbt rpm:packageBin

And then install the rpm with

sudo yum install target/rpm/RPMS/noarch/hercules-<version>.noarch.rpm

This should create all the necesary upstart scripts so that service hercules <command> can be used to control the server.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.