sara-nl/ACES-Training

Name: ACES-Training

Owner: SURFsara

Description: null

Forked from: chStaiger/ACES-Training

Created: 2016-03-30 08:05:42.0

Updated: 2017-11-28 17:16:06.0

Pushed: 2016-04-10 13:41:00.0

Homepage: null

Size: 1738

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

ACES training pipeline

Synopsis

This tutorial teaches master and PhD students how to coordinate so-called embarassingly parrallel computational tasks across different infratsructures.

Problem We have a huge computational problem which can be split into many smaller problems which are independent (embarassingly parallel) and by this making the probem computationally smaller. The single smaller problems can be run by several infrastructures. We now would like to coordinate runs solving the smaller problems and later on aggregate the results of the runs. Hence, we are left with an enormous administrational task.

This tutorial shows how to code, coordinate and distribute runs belonging to the same problem. The tutorial shows students how to create and process tokens which code for the single runs. The pipeline makes use of couchdb as a token pool server and uses python and the picasclient.

Technology requisites

To follow the tutorial you need a python distribution and access to a couchdb instance. In our tutorial we make use of the lisa-cluster. On lisa execute

_install --user couchdb
_install --user  scikit-learn

If you want to use an own python distribution, please install the following packages.

Module | Version ——-|————— numpy | 1.6.1. scipy | 0.10.0 sklearn | 0.11 h5py | 2.0.0 xlrd | not known couchDB | 0.9

Downloading this repository

You will need the code provided in this repository. You can download it like this:

clone https://github.com/sara-nl/ACES-Training.git

Change to ACES-Training/code and start python there. All code has to be run in this directory to make sure that the imports work.

Context

The training will make use of a double-loop crossvalidation pipeline which is described in detail in Staiger et. al. We will create tokens for the Single gene classifier, Random gene classifier and the Lee classifiers. Furthermore and for didactical reasons we will also create tokens which will fail to be processed by the pipeline.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.