CD2H gitForager

FredHutch/freeloader

Name: freeloader

Owner: Fred Hutchinson Cancer Research Center

Description: a simple web interface to upload data into object storage, process with hpc (slurm) and offer the results for download

Created: 2015-10-20 20:51:17.0

Updated: 2015-10-20 20:51:17.0

Pushed: 2015-10-20 23:51:57.0

Homepage: null

Size: 152

Language: null

GitHub Committers

User	Most Recent Commit	# Commits

Other Committers

User	Email	Most Recent Commit	# Commits

README

Freeloader

Freeloader is a simple web application (flask) that allows researchers to take advantage of HPC best practices to offer a compute pipeline to collaborators. An external collaborator uploads a data file through a web form. The data file is then processed on an HPC cluster using recommended HPC and storage access methods.

Design goals are:

ensure that only compute resources are used that are otherwise idle
ensure that the lowest cost storage is used to accompish the job
limit security risks by offering minimal options for user input
minimal impact on IT support

Implementation

The application is developed in Python using the flask microframework. The incoming file should be handed off to research python code that resides in a dedicated py file.

Features

Each pipeline will be offered under a dedicated URL
The only user interface will be a text field for an email address and a file uploader for one or multiple files (e.g. https://blueimp.github.io/jQuery-File-Upload/angularjs.html)
When the user hits the submit button the data could be scanned for viruses or PII content before it is handed of to the computational research code that then submits hpc jobs.
when the computation has ended the result is copied to swift and made available as a hidden file/link on the portal. The link has a configurable expiration date
an email is sent to the collaborator that contains a link to the result file

Configuration details

no NFS to writable shared file systems can be mounted on the webserver
only the HPC restart queue is going to be used (jobs are preempted within seconds should there be a resource conflict.
no posix file systems to store data. Input data will be stored in a Swift object store that can be accessed via S3 or native Swift API.
No authentication is provided.

Phase II

checkpoint restart. Jobs would be started with DMTCP checkpointing. Checkpoints are stored at regular intervals in persistent scratch file systems
an app that provides generic large file upload services to researchers to avoid issues with 3rd party upload tools which are not fully integrated.

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.