CD2H gitForager

broadinstitute/pywdl

Name: pywdl

Owner: Broad Institute

Description: Python bindings for WDL

Created: 2015-11-24 14:06:37.0

Updated: 2018-01-15 00:50:36.0

Pushed: 2017-11-09 19:51:35.0

Homepage: null

Size: 127

Language: Python

GitHub Committers

User	Most Recent Commit	# Commits

Other Committers

User	Email	Most Recent Commit	# Commits

README

READ THIS FIRST!

There is a very low probability that you're in the right place. If you're looking for a python WDL parser see the WDL repo.

This repository is deprecated The intention of this repository was to provide a Python object model on top of parsed WDL. It is out of date but we're leaving it here in case someone wants an example of how to do such a thing. If you'd like to pick up the torch please let us know.

NOTE AGAIN If you're reading below this you're almost certainly in the wrong place!

PyWDL

A Python implementation of a WDL parser and language bindings.

For Scala language bindings, use WDL4S.

PyWDL
Installation
Language Bindings
Abstract syntax trees (ASTs)
Working with expressions
Command Line Usage
Converting to DOT

Installation

PyWDL works with Python 2 or Python 3. Install via setup.py:

thon setup.py install

Or via pip:

p install wdl

Language Bindings

The main wdl package provides an interface to turn WDL source code into native Python objects. This means that a workflow {} block in WDL would become a Workflow object in Python and a task {} block becomes a Task object.

To parse WDL source code into a WdlDocument object, import the wdl package and load a WDL string with wdl.loads("wdl code") or WDL from a file-like object using wdl.load(fp, resource_name).

For example:

rt wdl
rt wdl.values

code = """
 my_task {
le file
mmand {
./my_binary --input=${file} > results

tput {
File results = "results"



flow my_wf {
ll my_task



e the language bindings to parse WDL into Python objects
namespace = wdl.loads(wdl_code)

workflow in wdl_namespace.workflows:
print('Workflow "{}":'.format(workflow.name))
for call in workflow.calls():
    print('    Call: {} (task {})'.format(call.name, call.task.name))

task in wdl_namespace.tasks:
name = task.name
abstract_command = task.command
def lookup(name):
    if name == 'file': return wdl.values.WdlFile('/path/to/file.txt')
instantated_command = task.command.instantiate(lookup)
print('Task "{}":'.format(name))
print('    Abstract Command: {}'.format(abstract_command))
print('    Instantiated Command: {}'.format(instantated_command))

Using the language bindings as shown above is the recommended way to use PyWDL. One can also directly access the parser to parse WDL source code into an abstract syntax tree using the wdl.parser package:

rt wdl.parser

code = """
 my_task {
le file
mmand {
./my_binary --input=${file} > results

tput {
File results = "results"



flow my_wf {
ll my_task



rse source code into abstract syntax tree
= wdl.parser.parse(wdl_code).ast()

int out abstract syntax tree
t(ast.dumps(indent=2))

cess the first task definition, print out its name
t(ast.attr('definitions')[0].attr('name').source_string)

nd all 'Task' ASTs
_asts = wdl.find_asts(ast, 'Task')
task_ast in task_asts:
print(task_ast.dumps(indent=2))

nd all 'Workflow' ASTs
flow_asts = wdl.find_asts(ast, 'Workflow')
workflow_ast in workflow_asts:
print(workflow_ast.dumps(indent=2))

Abstract syntax trees (ASTs)

An AST is the output of the parsing algorithm. It is a tree structure in which the root node is always a Document AST

The best way to get started working with ASTs is to visualize them by using the wdl parse subcommand to see the AST as text. For example, consider the following WDL file

example.wdl

 a {
mmand {./foo_bin}

 b {
mmand {./bar_bin}

 c {
mmand {./baz_bin}

flow w {}

Then, use the command line to parse and output the AST:

l parse example.wdl
ument:
ports=[],
finitions=[
(Task:
  name=<string:1:6 identifier "YQ==">,
  declarations=[],
  sections=[
    (RawCommand:
      parts=[
        <string:2:12 cmd_part "Li9mb29fYmlu">
      ]
    )
  ]
),
(Task:
  name=<string:4:6 identifier "Yg==">,
  declarations=[],
  sections=[
    (RawCommand:
      parts=[
        <string:5:12 cmd_part "Li9iYXJfYmlu">
      ]
    )
  ]
),
(Task:
  name=<string:7:6 identifier "Yw==">,
  declarations=[],
  sections=[
    (RawCommand:
      parts=[
        <string:8:12 cmd_part "Li9iYXpfYmlu">
      ]
    )
  ]
),
(Workflow:
  name=<string:10:10 identifier "dw==">,
  body=[]
)

Programmatically, if one wanted to traverse this AST to pull out data:

rt wdl.parser
rt wdl

 open('example.wdl') as fp:
ast = wdl.parser.parse(fp.read()).ast()

_a = ast.attr('definitions')[0]
_b = ast.attr('definitions')[1]
_c = ast.attr('definitions')[2]

ast in task_a.attr('sections'):
if ast.name == 'RawCommand':
    task_a_command = ast

ast in task_a_command.attr('parts'):
if isinstance(ast, wdl.parser.Terminal):
    print('command string: ' + ast.source_string)
else:
    print('command parameter: ' + ast.dumps())

wdl.parser.Ast

The Ast class is a syntax tree with a name and children nodes.

Attributes:

name is a string that refers to the type of AST, (e.g. Workflow, Task, Document, RawCommand)
attributes is a dictionary where the keys are the name of the attribute and the values can be one of three types: Ast, AstList, Terminal.

Methods:

def attr(self, name). ast.attr('name') is the same as ast.attributes['name'].
def dumps(self, indent=None, b64_source=True) - returns a String representation of this AstList. the indent parameter takes an integer for the indent level. Omitting this value will cause there to be no new-lines in the resulting string. b64_source will be passed to recursive invocations of dumps.

wdl.parser.Terminal

The wdl.parser.Terminal object represents a literal piece of the original source code. This always shows up as leaf nodes on Ast objects

Attributes:

source_string - String segment from the source code.
line - Line number where source_string was in source code.
col - Column number where source_string was in source code.
resource - Name of the location for the source code. Usually a file system path or perhaps URI.
id - Numeric identifier, unique to the top level Ast. Used mostly internally.
str - String identifier of this terminal. Used mostly internally.

Methods:

def dumps(self, b64_source=True, **kwargs) - return a String representation of this terminal. b64_source means that the source code will be base64 encoded because sometimes the source contains newlines or special characters that make it difficult to read when a whole AST is string-ified.

wdl.parser.AstList

class AstList(list) represents a sequence of Ast, AstList, and Terminal objects

Methods:

def dumps(self, indent=None, b64_source=True) - returns a String representation of this AstList. the indent parameter takes an integer for the indent level. Omitting this value will cause there to be no new-lines in the resulting string. b64_source will be passed to recursive invocations of dumps.

Working with expressions

Parsing a WDL file will result in unevaluated expressions. For example:

flow test {
t a = (1 + 2) * 3
ll my_task {
input: var=a*2, var2="file"+".txt"

This workflow definition has three expressions in it: (1 + 2) * 3, a*2, and "file"+".txt".

Expressions are stored in wdl.binding.Expression object. The AST for the expression is stored in this object.

Expressions can be evaluated with the eval() method on the Expression class.

rt wdl

nually parse expression into wdl.binding.Expression
ession = wdl.parse_expr("(1 + 2) * 3")

aluate the expression.
turns a WdlValue, specifically a WdlIntegerValue(9)
uated = expression.eval()

t the Python value
t(evaluated.value)

Sometimes expressions contain references to variables or functions. In order for these to be resolved, one must pass a lookup function and an implementation of the functions that you want to support:

rt wdl
 wdl.values import WdlInteger, WdlUndefined

test_lookup(identifier):
if identifier == 'var':
    return WdlInteger(4)
else:
    return WdlUndefined

test_functions():
def add_one(parameters):
    # assume at least one parameter exists, for simplicity
    return WdlInteger(parameters[0].value + 1)
def get_function(name):
    if name == 'add_one': return add_one
    else: raise EvalException("Function {} not defined".format(name))
return get_function

lInteger(12)
t(wdl.parse_expr("var * 3").eval(test_lookup))

lInteger(8)
t(wdl.parse_expr("var + var").eval(test_lookup))

lInteger(9)
t(wdl.parse_expr("add_one(var + var)").eval(test_lookup, test_functions()))

Command Line Usage

l --help
e: wdl [-h] [--version] [--debug] [--no-color] {run,parse} ...

flow Description Language (WDL)

tional arguments:
unarse}  WDL Actions
run        Run you a WDL
parse      Parse a WDL file, print parse tree

onal arguments:
, --help   show this help message and exit
version    show program's version number and exit
debug      Open the floodgates
no-color   Don't colorize output

Parse a WDL file:

l parse examples/ex2.wdl
ument:
finitions=[
(Task:
  name=<ex2.wdl:1:6 identifier "c2NhdHRlcl90YXNr">,
  declarations=[],
  sections=[
    (RawCommand:

Converting To DOT

A wdl file can be converted to the dot format in order to be able to visualize the pipeline as a graph. For example:

l2dot -i hello.wdl -o hello.dot

Then use interactive renderer xdot or save to an image:

ot hello.dot
t -Tsvg hello.dot -o hello.svg

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.