basespace/LaunchSpace

Name: LaunchSpace

Owner: Basespace

Description: null

Created: 2015-03-24 20:47:44.0

Updated: 2017-09-10 09:22:15.0

Pushed: 2015-06-25 21:02:40.0

Homepage: null

Size: 330

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

INTRODUCTION

LaunchSpace is a set of Python scripts that allow BaseSpace users to automatically launch analysis on their samples as soon as they are ready. These analyses are then tracked to completion, with optional automated quality control and download of a subset of the generated files. The scripts use analysis templates to configure how samples should be analysed. To provide hands-off automation, these tools are designed to run using the Unix service cron to periodically run the scripts and start analysis whenever it is ready.

AUTHORS

Peter Saffrey (psaffrey@illumina.com) Rodger Constandse (rconstandse@illumina.com)

REQUIREMENTS

https://developer.basespace.illumina.com/docs/content/documentation/sdk-samples/python-sdk-overview

INSTALL

Ensure your python has all the proper dependencies installed:

pip install -r dependencies.pip

The scripts are designed to run in place in the location they have been downloaded and unpacked.

GETTING STARTED

These instructions assume an appropriate version of Python in $PYTHON and that the code has been checked out into $LAUNCHSPACE. It also assumes an installation of the BaseSpace Python SDK and valid user credentials - for more details see the BaseSpace Python SDK documentation.

Workflow overview

LaunchSpace uses a local configuration database (using sqlite3) to store information about projects and apps, linking these to their corresponding entities in BaseSpace. Command line tools allow the creation of samples (either individually or in batches) within projects where each sample can have one or more linked app. A Launcher tool queries BaseSpace for each sample and project to see whether there is any data in BaseSpace under that sample name. If there is and it meets yield requirements, the associated app is launched on that sample data. Once launched, a BaseSpace AppSessionId is stored locally, linking the sample app run to its associated BaseSpace entity. The Tracker uses this information to track the app to completion. Once completed, the same AppSessionId is used by the QCChecker to download an appropriate metrics file and compare this to a set of thresholds for the relevant app, to mark this analysis as qc-passed or qc-failed. The Downloader then downloads a specified group of files to local storage for delivery or further analysis.

Make log directory

The cron-based tools of LaunchSpace need a log directory to write into. By default, this is $LAUNCHSPACE/log but this empty directory is not present in the git repository so you will need to create it. You can also alter the log directory by editing $LAUNCHSPACE/etc/config.py

Initialise local configuration database

$PYTHON $LAUNCHSPACE/bin/InitialiseDatabase.py

Initialise projects

Initialising a project makes LaunchSpace aware that a BaseSpace project is of interest for analysis. Project initialisation requires two arguments:

Example:

$PYTHON $LAUNCHSPACE/bin/CreateProject.py -n Project Test -p /projects/Test

Initialise apps

App initialisation requires the following arguments:

Example:

$PYTHON $LAUNCHSPACE/bin/CreateApp.py -n IsaacV2 -t data/apptemplates/IsaacV2Template.json -r data/thresholds/isaacv2.json -m summary.csv -y SingleGenome -d vcf,report.pdf -b 278278

Accessioning samples

LaunchSpace can only launch analysis on samples that have been specifically accessioned within its local configuration database. This allows users to select samples they want to analyse and LaunchSpace can then automatically launch apps as soon as data becomes available. Accessioning samples in this way does not alter BaseSpace, only the local configuration database. It makes LaunchSpace aware that data for a particular sample name is expected and that as soon as that data is available an app should be run on it.

Samples are accessioned using the $LAUNCHSPACE/bin/CreateSamples.py command, which has a number of different input mechanisms:

SampleApps

As well as accessioning a sample in the local configuration database (which can be queried with ListSamples.py, see below), accessioning a sample alongside its app also creates a SampleApp entry, which LaunchSpace uses to track the progress of the analysis assigned to this sample. Each SampleApp has a status, which is updated as LaunchSpace launches the app and tracks its progress. A sample can have more than one app assigned to it; each SampleApp is tracked separately. SampleApp entries can be tracked with ListSampleApps.py (see below).

Examples:

Accession a sample called NA12878_Expt18_mpx2_TSNano_704 within the Project Test project which should have the IsaacV2 app run on it:

$PYTHON $LAUNCHSPACE/bin/CreateSamples.py -p Project Test -a IsaacV2 -n NA12878_Expt18_mpx2_TSNano_704

Accession samples from a tsv file included in the LaunchSpace repository:

$PYTHON $LAUNCHSPACE/bin/CreateSamples.py -p Project Test -f $LAUNCHSPACE/data/samplelists/testlist.tsv

Accession samples from a Clarity LIMS manifest included in the LaunchSpace repository:

$PYTHON $LAUNCHSPACE/bin/CreateSamples.py -p Project Test -l $LAUNCHSPACE/data/samplelists/ClarityLIMSSampleList.txt

Run the Launcher

The Launcher is the tool that launches BaseSpace apps across all SampleApp entries that meet the proper conditions. It executes the following set of steps:

The Launcher is designed to be run without arguments as a cron entry (see below). In this mode, it runs with no output to stdout or stderr, outputting any messages into a log file. However, it can be useful to run the Launcher manually to check what would happen or if manual intervention is needed. In these cases, the following options are useful:

Examples:

Show in detail what you would do without doing it:

$PYTHON $LAUNCHSPACE/bin/Launcher.py -s -L DEBUG -l

Launch app on SampleApp entry with id 14, even if there is not enough yield. Show output to stdout:

$PYTHON $LAUNCHSPACE/bin/Launcher.py -i 14 -Y -l

Run the Tracker

The Tracker is the tool that tracks submitted and running SampleApp entries updating their status. It executes the following set of steps:

Like the Launcher the Tracker is designed to be run on a cron and only provide output into a log file. Also like the Launcher there are arguments for manual intervention:

Examples:

Show in detail what you would do without doing it:

$PYTHON $LAUNCHSPACE/bin/Tracker.py -s -L DEBUG -l

Run the QCChecker

The QCChecker is the tool that pulls down a specific metrics file from an app result and evaluates whether those metrics are within specified thresholds. It goes through the following steps:

The QCChecker has the same manual options as the Tracker - individual SampleApps, safe mode and debugging output.

Run the Downloader

The Downloader is the tool that pulls down the “deliverable” for an app result - all the files with any of a set of extensions. It goes through the following steps:

The Downloader has the same manual options as the Tracker - individual SampleApps, safe mode and debugging output.

AUTOMATING THE WORKFLOW

Installing a crontab

It is recommended that before the LaunchSpace scripts are installed under a crontab, each of the cron tools (Launcher, Tracker, QCChecker and Downloader) are run manually for a few samples to understand their operation. This will help provide a smooth integration of the scripts.

A crontab is included in the root of the LaunchSpace repository. It includes all the entries necessary to have a fully automated system, where all the user needs to do is accession samples and point their sequencing devices at BaseSpace and everything else should happen without intervention. In practice, some proportion of samples always need manual intervention; these are described in more detail below.

Before the crontab can be installed, it should be edited to include the path to the Python executable and LaunchSpace source code. The crontab can then be installed up by running:

crontab $LAUNCHSPACE/crontab

checked by running:

crontab -l

and edited by running:

crontab -e

Note that cron runs as a specific user, and this user must have the proper BaseSpace credentials setup in their .basespacepy.cfg file.

Monitoring progress

The happy path for samples handled by LaunchSpace should be as follows:

During each of these stages, there are several ways to keep track of the SampleApp run. While the app is running, within BaseSpace itself you can find the AppSession for the SampleApp in the UI and monitor progress there, including checking any logging output.

For bulk checking and to track the status after the app run has finished, LaunchSpace also provides the tool ListSampleApps.py to query the status of SampleApps and, if necessary manually adjust their status. It has the following options:

Get SampleApp entries by:

These options use substring matching by default. Exact matching can be switched on by using -x

You can report the status details field of the SampleApp entry by adding a -e. These details might include the reason a SampleApp is waiting (for example No data or Not enough yield) why a sample failed QC or the error message provided when an app failed to download.

Finally you can also opt to apply an operation to all the SampleApps selected by the other arguments:

The typical pattern for applying these options would be to first write and test a set of options to get the SampleApps of interest. Then add the -S to set their status.

For example get all the SampleApps from a particular project with status qc-failed:

$PYTHON $LAUNCHSPACE/bin/ListSampleApps.py -p Project Test -u qc-failed

$PYTHON $LAUNCHSPACE/bin/ListSampleApps.py -p Project Test -u qc-failed -S qc-passed

Intervening in problem samples

Below is a list of possible deviations from the happy path and actions that can be taken to correct this. In many cases, these suggestions involve using ListSampleApps.py to find the problem cases and manually set their status to move them on or force them to repeat a step.

Data for sample is not found in BaseSpace
SampleApp is marked as launch-failed
SampleApp is marked as run-failed
SampleApp is marked as qc-failed
SampleApp is marked as download-failed

OTHER MONITORING TOOLS

Tool | Purpose ListProjects.py | List accessioned projects ListSamples.py | List accessioned samples with their associated project name ListApps.py | List details of all the accessioned apps

FURTHER NOTES AND KNOWN LIMITATIONS

QC metrics business logic

The logic to unpack the metrics for an app has been tested against the Isaac V2 and tumour/normal apps. For other apps this might need to be extended or modified to extract the metrics properly. This logic is found in AppServices.py in the _ReadQCResult() method.

Project Sample Limit

For projects with greater than 1000 items, the BaseSpace API requires the use of paginated requests that are not supported by LaunchSpace. Therefore, If your project contains more than 1000 samples LaunchSpace will not function properly. Extending LaunchSpace to support pagination would be possible in a future release. As a workaround, we recommend creating separate projects in these cases - this can be done by service period, for example myproject_jan15, myproject_feb15 or myproject_q1, myproject_q2. Separate projects can also be more convenient for large numbers of samples.

Direct Database Manipulation

LaunchSpace is based on sqlite and the database created has triggers and foreign keys set up to implement business logic about updating columns and keeping tables consistent. For example, if you delete a Sample, it will also delete any associated SampleApp entries. This means you should be able to use standard sqlite tools to work with the database (found by default in $LAUNCHSPACE/data/db.sqlite) without interfering with the operation of the LaunchSpace scripts. This might be desirable if you need to perform operations that are not directly supported by the scripts or for other customisation.

During LaunchSpace development, we used sqlitebrowser:

http://sqlitebrowser.org/

as well as the command line tool sqlite3 to inspect and alter the database.

The only caveat with these tools is that you need to ensure the PRAGMA for foreign keys is switched on or these will not be properly enforced. In sqlitebrowser this can be set in the “Edit Pragmas” tab. For the command line tool, create a .sqliterc file in your home directory containing this line:

PRAGMA foreign_keys=1;

GLOSSARY

SampleApp status table:

Status | Description ————- | ———– waiting | Sample has been accessioned but not yet checked by Launcher OR sample has been checked but does not yet meet launch conditions submitted | App has been submitted to BaseSpace pending | App run is pending execution in BaseSpace running | App is running in BaseSpace launch-failed | App failed to launch. The error message can be seen with ListSampleApps.py -e run-failed | App failed whilst running app-finished | The app finished successfully qc-failed | The app result failed QC. Some details on the failure can be seen with ListSampleApps.py -e qc-passed | The app result passed QC downloading | The app result deliverable is currently being downloaded download-failed | The app result failed while downloading. Some details on the failure can be seen with ListSampleApps.py -e downloaded | The app result has downloaded


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.