Name: ACES-Training
Owner: SURFsara
Description: null
Forked from: chStaiger/ACES-Training
Created: 2016-03-30 08:05:42.0
Updated: 2017-11-28 17:16:06.0
Pushed: 2016-04-10 13:41:00.0
Homepage: null
Size: 1738
Language: Python
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
This tutorial teaches master and PhD students how to coordinate so-called embarassingly parrallel computational tasks across different infratsructures.
Problem We have a huge computational problem which can be split into many smaller problems which are independent (embarassingly parallel) and by this making the probem computationally smaller. The single smaller problems can be run by several infrastructures. We now would like to coordinate runs solving the smaller problems and later on aggregate the results of the runs. Hence, we are left with an enormous administrational task.
This tutorial shows how to code, coordinate and distribute runs belonging to the same problem. The tutorial shows students how to create and process tokens which code for the single runs. The pipeline makes use of couchdb as a token pool server and uses python and the picasclient.
To follow the tutorial you need a python distribution and access to a couchdb instance. In our tutorial we make use of the lisa-cluster. On lisa execute
_install --user couchdb
_install --user scikit-learn
If you want to use an own python distribution, please install the following packages.
Module | Version ——-|————— numpy | 1.6.1. scipy | 0.10.0 sklearn | 0.11 h5py | 2.0.0 xlrd | not known couchDB | 0.9
You will need the code provided in this repository. You can download it like this:
clone https://github.com/sara-nl/ACES-Training.git
Change to ACES-Training/code and start python there. All code has to be run in this directory to make sure that the imports work.
The training will make use of a double-loop crossvalidation pipeline which is described in detail in Staiger et. al. We will create tokens for the Single gene classifier, Random gene classifier and the Lee classifiers. Furthermore and for didactical reasons we will also create tokens which will fail to be processed by the pipeline.