broadinstitute/widdler

Name: widdler

Owner: Broad Institute

Description: A command-line tool for executing, managing, and querying WDL workflows on Cromwell servers.

Created: 2017-07-13 11:45:40.0

Updated: 2018-04-11 11:40:47.0

Pushed: 2018-05-11 15:41:19.0

Homepage:

Size: 91011

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

widdler

Introduction

Widdler is a command-line tool for executing WDL workflows on Cromwell servers. Features include:

Dependencies

Widdler requires Python 2.7 and Java-1.8 to be loaded in your environment in order for full functionality to work.

Usage

Below is widdler's basic help text. Widdler expects one of three usage modes to be indicated as it's first argument: run, query, or abort.

e: widdler.py <positional argument> [<args>]

ription: A tool for executing and monitoring WDLs to Cromwell instances.

tional arguments:
estart,explain,log,abort,monitor,query,run,validate,label,email}

onal arguments:
, --help            show this help message and exit
widdler.py run

Below is widdler's run help text. It expects the user to provide a wdl file, json file, and to indicate one of the available servers for execution. The validate option validates both the WDL and the JSON file submitted and is on by default.

e: widdler.py run <wdl file> <json file> [<args>]

it a WDL & JSON for execution on a Cromwell VM.

tional arguments:
l                   Path to the WDL to be executed.
on                  Path the json inputs file.

onal arguments:
, --help            show this help message and exit
, --validate        Validate WDL inputs in json file. (default: False)
 LABEL, --label LABEL
                    A key:value pair to assign. May be used multiple
                    times. (default: None)
, --monitor         Monitor the workflow and receive an e-mail
                    notification when it terminates. (default: False)
 INTERVAL, --interval INTERVAL
                    If --monitor is selected, the amount of time in
                    seconds to elapse between status checks. (default: 30)
 EXTRA_OPTIONS, --extra_options EXTRA_OPTIONS
                    Additional workflow options to pass to Cromwell.
                    Specify as k:v pairs. May be specified multipletimes
                    for multiple options. See
                    https://github.com/broadinstitute/cromwell#workflow-
                    optionsfor available options. (default: None)
, --verbose         If selected, widdler will write the current status to
                    STDOUT until completion while monitoring. (default:
                    False)
, --no_notify       When selected, disable widdler e-mail notification of
                    workflow completion. (default: False)
 DEPENDENCIES, --dependencies DEPENDENCIES
                    A zip file containing one or more WDL files that the
                    main WDL imports. (default: None)
, --disable_caching
                    Don't used cached data. (default: False)
 {ale1,btl-cromwell,localhost,gscid-cromwell}, --server {ale1,btl-cromwell,localhost,gscid-cromwell}
                    Choose a cromwell server from ['ale1', 'btl-cromwell',
                    'localhost', 'gscid-cromwell'] (default: None)

For example:

 will return a workflow ID and status if successfully submitted, for example:

This will execute a workflow that uses subworkflows:

s may also invoke Widdler's monitoring capabilities when initiating a workflow. See below for an 
anation of monitoring options.

widdler.py restart

 workflow has been previously executed to a Cromwell server, it is possible to restart the workflow after it has
leted and run it again with the same inputs simply by providing the workflow ID and server of the original run.
usage for performing this action is as follows:

usage: widdler.py restart

Restart a submitted workflow.

positional arguments: workflow_id workflow id of workflow to restart.

optional arguments: -h, –help show this help message and exit -S {ale,btl-cromwell}, –server {ale,btl-cromwell}

                    Choose a cromwell server from ['ale', 'btl-cromwell']
                    (default: None)
example:
 restart workflow b931xxx and return the new workflow id like so:

Workflow restarted successfully; new workflow-id: 164678b8-2a52-40f3-976c-417c777c78ef

lly, any restarted workflows will inherit the labels of it's originating workflow.

widdler.py query

w is widdler's query help text. Aside from the workflow ID it expects one or more optional
ments to request basic status, metadata, and/or logs. 
usage: widdler.py query <workflow id> [<args>]

Query cromwell for information on the submitted workflow.

positional arguments:

 workflow_id           workflow id for workflow execution of interest.
                       (default: None)

optional arguments:

 -h, --help            show this help message and exit
 -s, --status          Print status for workflow to stdout (default: False)
 -m, --metadata        Print metadata for workflow to stdout (default: False)
 -l, --logs            Print logs for workflow to stdout (default: False)
 -u USERNAME, --username USERNAME
                       Owner of workflows to monitor. (default: amr)
 -L LABEL, --label LABEL
                       Query status of all workflows with specific label(s).
                       (default: None)
 -d DAYS, --days DAYS  Last n days to query. (default: 7)
 -S {ale1,btl-cromwell,localhost,gscid-cromwell}, --server {ale1,btl-cromwell,localhost,gscid-cromwell}
                       Choose a cromwell server from ['ale1', 'btl-cromwell',
                       'localhost', 'gscid-cromwell'] (default: None)
 -f {Running,Submitted,QueuedInCromwell,Failed,Aborted,Succeeded}, --filter {Running,Submitted,QueuedInCromwell,Failed,Aborted,Succeeded}
                       Filter by a workflow status from those listed above.
                       May be specified more than once. (default: None)
 -a, --all             Query for all users. (default: False)  
example:

will return something like this:


will return a ton of information like so (truncated for viewability):

atus': 'Running', 'submittedFiles': {'workflow': '# GATK WDL\r\n# import "hc_scatter.wdl" as sub\r\n\r\ntask VersionCheck {\r\n    String gatk\r\n    command {\r\n        source
ad/software/scripts/useuse\r\n        use Java-1.8\r\n        use Python-2.7\r\n... 'ref': '/cil/shed/sandboxes/amr/dev/gatk_pipeline/output/pfal_5/Plasmodium_falciparum_3D7.fasta'}}]}, 'submi
n': '2017-07-14T11:26:05.931-04:00', 'workflowName': 'gatk', 'outputs': {}, 'id': '2f8bb5c6-8254-4d38-b010-620913dd325e'}]

and:

[{'id': '2f8bb5c6-8254-4d38-b010-620913dd325e', 'calls': {'gatk.MakeSampleDir': [{'shardIndex': 0, 'attempt': 1, 'stderr': '/cil/shed/apps/internal/cromwell_new/cromwell-executions/ga tk/2f8bb5c6-8254-4d38-b010-620913dd325e/call-MakeSampleDir/shard-0/execution/stderr', 'stdout': '/cil/shed/apps/internal/cromwell_new/cromwell-executions/gatk/2f8bb5c6-8254-4d38-b010- 620913dd325e/call-MakeSampleDir/shard-0/execution/stdout'}

widdler.py abort

w is widdler's abort usage. Simply provide the 

Abort a submitted workflow.

positional arguments:

 workflow_id           workflow id of workflow to abort.

optional arguments:

 -h, --help            show this help message and exit
 -S {ale,btl-cromwell}, --server {ale,btl-cromwell}
                       Choose a cromwell server from ['ale', 'btl-cromwell']
                       (default: None)
 example:

will return:

atus': 'Aborted', 'id': '2f8bb5c6-8254-4d38-b010-620913dd325e'}
widdler.py explain

Running widdler.py explain will provide information at command line similar to the monitor e-mail, including workflow status, root directory, stdout and stderr information, and useful links. Usage is as follows:

e: widdler.py explain <workflowid>

ain the status of a workflow.

tional arguments:
rkflow_id           workflow id of workflow to abort.

onal arguments:
, --help            show this help message and exit
 {ale,btl-cromwell}, --server {ale,btl-cromwell}
                    Choose a cromwell server from ['ale', 'btl-cromwell']
                    (default: None)

This example:

on widdler.py explain b931c639-e73d-4b59-9333-be5ede4ae2cb -S ale

will return:

': 'b931c639-e73d-4b59-9333-be5ede4ae2cb',
atus': 'Failed',
rkflowRoot': '/cil/shed/apps/internal/cromwell_gaag/cromwell-executions/gatk/b931c639-e73d-4b59-9333-be5ede4ae2cb'}
---------Failed Stdout-------------
/shed/apps/internal/cromwell_gaag/cromwell-executions/gatk/b931c639-e73d-4b59-9333-be5ede4ae2cb/call-ApplySnpRecalibration/execution/stdout:
no 2] No such file or directory: u'/cil/shed/apps/internal/cromwell_gaag/cromwell-executions/gatk/b931c639-e73d-4b59-9333-be5ede4ae2cb/call-ApplySnpRecalibration/execution/stdout'
---------Failed Stderr-------------
/shed/apps/internal/cromwell_gaag/cromwell-executions/gatk/b931c639-e73d-4b59-9333-be5ede4ae2cb/call-ApplySnpRecalibration/execution/stderr:
no 2] No such file or directory: u'/cil/shed/apps/internal/cromwell_gaag/cromwell-executions/gatk/b931c639-e73d-4b59-9333-be5ede4ae2cb/call-ApplySnpRecalibration/execution/stderr'
---------Cromwell Links-------------
://ale:9000/api/workflows/v1/b931c639-e73d-4b59-9333-be5ede4ae2cb/metadata
://ale:9000/api/workflows/v1/b931c639-e73d-4b59-9333-be5ede4ae2cb/timing

Note that in this case, there were no stdout or stderr for the step that failed in the workflow.

Validation

(Requires Java-1.8, so make sure to 'use Java-1.8' before trying validation)

Widdler validation attempts to validate the inputs in the user's supplied json file against the WDL arguments in the supplied WDL file. Validation is OFF by default and so users must specify it using the -v flag if using widdler.py run. Validaton can also be performed using widdler.py validate if you wish to validate inputs without executing the workflow.

It will validate the following:

It will NOT validate the following:

A note on validating WDL files with dependencies: due to the limitations of the current implementation of depedency validation, WDL file dependencies must be present in the same directory as the main WDL file and must be unzipped. Otherwise validation may not work.

Validation may also be run as a stand-alone operation using widdler.py validate. Usage is as follows:

e: widdler.py validate <wdl_file> <json_file>

date (but do not run) a json for a specific WDL file.

tional arguments:
l         Path to the WDL associated with the json file.
on        Path the json inputs file to validate.

onal arguments:
, --help  show this help message and exit

For example:

he json file has errors, a list of errors will be reported in the same way that the runtime validation reports.
example:

bad.json input file contains the following errors: gatk.ts_filter_snp: 99 is not a valid Float. gatk.tcir: False is not a valid Boolean. Note that JSON boolean values must not be quoted. gatk.ploidy: 2.0 is not a valid Int. Required parameter gatk.snp_annotation is missing from input json. Required parameter gatk.ref_file is missing from input json.

widdler.py log

ing 'widdler.py log' will print to screen the commands used by each task of a workflow. For example, running:

widdler.py log becb307f-4718-4d8b-836f-5780d64c4a82 -S btl-cromwell

lts in the following:

{u'hello.helloWorld': [{u'attempt': 1, u'shardIndex': -1, u'stderr': u'/btl/store/cromwell_executions/hello/becb307f-4718-4d8b-836f-5780d64c4a82/call-helloWorld/execution/stderr', u'stdout': u'/btl/store/cromwell_executions/hello/becb307f-4718-4d8b-836f-5780d64c4a82/call-helloWorld/execution/stdout'}]} hello.helloWorld:

!/bin/bash

tmpDir=$(mktemp -d /cil/shed/apps/internal/cromwell_new/cromwell-executions/hello/d90bf4f3-d9fb-4f07-92d9-0d46c40355f1/call-helloWorld/execution/tmp.XXXXXX) chmod 777 $tmpDir export _JAVA_OPTIONS=-Djava.io.tmpdir=$tmpDir export TMPDIR=$tmpDir

( cd /cil/shed/apps/internal/cromwell_new/cromwell-executions/hello/d90bf4f3-d9fb-4f07-92d9-0d46c40355f1/call-helloWorld/execution echo Hello, amr ) echo $? > /cil/shed/apps/internal/cromwell_new/cromwell-executions/hello/d90bf4f3-d9fb-4f07-92d9-0d46c40355f1/call-helloWorld/execution/rc.tmp ( cd /cil/shed/apps/internal/cromwell_new/cromwell-executions/hello/d90bf4f3-d9fb-4f07-92d9-0d46c40355f1/call-helloWorld/execution

) sync mv /cil/shed/apps/internal/cromwell_new/cromwell-executions/hello/d90bf4f3-d9fb-4f07-92d9-0d46c40355f1/call-helloWorld/execution/rc.tmp /cil/shed/apps/internal/cromwell_new/cromwell-executions/hello/d90bf4f3-d9fb-4f07-92d9-0d46c40355f1/call-helloWorld/execution/rc

widdler.py monitor

ler allows the monitoring of workflow(s). Unlike the query options, monitoring persists until a workflow reaches
rminal state (any state besides 'Running' or 'Submitted'). While monitoring, it can optionally print the status of
rkflow to the screen, and when a terminal state is reached, it can optionally e-mail the user (users are assumed
e of the broadinstitute.org domain) when the workflow is finished.

toring usage is as follows:

usage: widdler.py monitor []

Monitor a particular workflow and notify user via e-mail upon completion. If aworkflow ID is not provided, user-level monitoring is assumed.

positional arguments: workflow_id workflow id for workflow to monitor. Do not specify if

                    user-level monitoring is desired. (default: None)

optional arguments: -h, –help show this help message and exit -u USERNAME, –username USERNAME

                    Owner of workflows to monitor. (default: <your user name>)

-i INTERVAL, –interval INTERVAL

                    Amount of time in seconds to elapse between status
                    checks. (default: 30)

-V, –verbose When selected, widdler will write the current status

                    to STDOUT until completion. (default: False)

-n, –no_notify When selected, disable widdler e-mail notification of

                    workflow completion. (default: False)

-S {ale,btl-cromwell}, –server {ale,btl-cromwell}

                    Choose a cromwell server from ['ale', 'btl-cromwell']
                    (default: None)
 Single Workflow Monitoring

e from monitoring of a single workflow with widdler's run command, you can also execute a monitor as in the 
owing example:

widdler.py monitor 7ff17cb3-12f1-4bf0-8754-e3a0d39178ea -S btl-cromwell

his case, widdler will continue to silently monitor this workflow until it detects a terminal status. An 
ail will be sent to <user>@broadinstitute.org when a terminal status is detected, which will include
metadata of the workflow.

-verbose were selected, the user would have seen a STDOUT message indicating the workflows status at intervals 
ned by the --interval parameter, which has a default of 30 seconds. 

-no_notify were selected, an e-mail would not be sent.

 User Workflow Monitoring
e this feature is still under active development and is currently quite primitive)

's may also monitor all workflows for a given user name by omitting the workflow_id parameter and specifying the
er parameter like so:

widdler.py monitor -u amr -n -S btl-cromwell

, the user 'amr' is monitoring all workflows ever executed by him using widdler. Any workflows not executed by 
ler will not be monitored. Workflows in a terminal state prior to execution will have an e-mail sent immediately
rding their status, and any running workflows will result in an e-mail once they terminate. Using the --verbose
on here would result in STDOUT output for each workflow that is monitored at intervals specified by --interval.

ogging

ler logs information in the application's logs directory in a file called widdler.log.
 can be useful to find information on widdler executions including workflow id and query
lts and can help users locate workflow IDs if they've been lost. Each execution in the log
resented like so, with the user's username indicated in the start/stop separators for 
enient identification.

2017-07-14 12:10:44,746 - widdler - INFO - Parameters chosen: {'logs': False, 'func': , 'status': True, 'workflow_id': '7ff17cb3-12f1-4bf0-8754-e3a0d39178ea', 'server': 'btl-cromwell', 'metadata': False} 2017-07-14 12:10:44,746 - widdler.cromwell.Cromwell - INFO - URL:http://btl-cromwell:9000/api/workflows/v1 2017-07-14 12:10:44,746 - widdler.cromwell.Cromwell - INFO - Querying status for workflow 7ff17cb3-12f1-4bf0-8754-e3a0d39178ea 2017-07-14 12:10:44,747 - widdler.cromwell.Cromwell - INFO - GET REQUEST:http://btl-cromwell:9000/api/workflows/v1/7ff17cb3-12f1-4bf0-8754-e3a0d39178ea/status 2017-07-14 12:10:44,812 - widdler - INFO - Result: [{'id': '7ff17cb3-12f1-4bf0-8754-e3a0d39178ea', 'status': 'Running'}] 2017-07-14 12:10:44,813 - widdler - INFO - ————-End Widdler Execution by amr————-

nown Issues

ddler will sometimes print 'null' to stdout. This does not impact proper operation of widdler.

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.