wtsi-hgi/gatk-cwl-generator

Name: gatk-cwl-generator

Owner: Wellcome Trust Sanger Institute - Human Genetics Informatics

Description: Generates CWL files from the GATK documentation

Created: 2017-06-23 15:21:02.0

Updated: 2017-12-12 08:40:29.0

Pushed: 2018-01-15 16:59:44.0

Homepage:

Size: 44429

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

gatk-cwl-generator

Generates CWL files from the GATK documentation

Installation

First, install the module

clone https://github.com/wtsi-hgi/gatk-cwl-generator
atk-cwl-generator
on setup.py install

You may also want to install cwltool to run the generated CWL files

Requirements
Usage
e: gatkcwlgenerator [-h] [--version VERSION] [--verbose] [--out OUTPUT_DIR]
                    [--include INCLUDE] [--dev] [--use_cache [CACHE_LOCATION]]
                    [--no_docker] [--docker_image_name DOCKER_IMAGE_NAME]
                    [--gatk_command GATK_COMMAND]

rates CWL files from the GATK documentation

onal arguments:
, --help            show this help message and exit
version VERSION, -v VERSION
                    Sets the version of GATK to parse documentation for.
                    Default is 3.5-0
verbose             Set the logging to be verbose. Default is False.
out OUTPUT_DIR, -o OUTPUT_DIR
                    Sets the output directory for generated files. Default
                    is ./gatk_cmdline_tools/<VERSION>/
include INCLUDE     Only generate this file (note, CommandLinkGATK has to
                    be generated for v3.x)
dev                 Enable --use_cache and overwriting of the generated
                    files (for development purposes). Requires
                    requests_cache to be installed
use_cache [CACHE_LOCATION]
                    Use requests_cache, using the cache at CACHE_LOCATION,
                    or 'cache' if not specified. Default is False.
no_docker           Make the generated CWL files not use docker
                    containers. Default is False.
docker_image_name DOCKER_IMAGE_NAME, -c DOCKER_IMAGE_NAME
                    Docker image name for generated cwl files. Default is
                    'broadinstitute/gatk3:<VERSION>' for version 3.x and
                    'broadinstitute/gatk:<VERSION>' for 4.x
gatk_command GATK_COMMAND, -l GATK_COMMAND
                    Command to launch GATK. Default is 'java -jar
                    /usr/GenomeAnalysisTK.jar' for gatk 3.x and 'java -jar
                    /gatk/gatk.jar' for gatk 4.x

This has been tested on versions 3.5-0 to 3.8-0 and 4.beta.6.

The parameters generated are the same as you would need to specify on the command line, with “–” stripped from the beginning.

To add tags to arguments that have a file type, add to the parameter <NAME>_tags. e.g. to output the parameter --variant:vcf path\to\file, use the input:

ant:
lass: File
ath: path\to\file

ant_tags: [vcf]

For convenience, you can also specify any array input argument as a single element.

The cwl files will be outputted to gatk_cmdline_tools/<VERSION>/cwl and the JSON files given by the documentation to gatk_cmdline_tools/<VERSION>/json.

Generated CWL files
Examples

To test the generated CWL files, provided are inputs to the HaplotypeCaller tool. To test assuming you have used the default options and have installed everything as above, run:

runner gatk_cmdline_tools/3.5/HaplotypeCaller.cwl examples/HaplotypeCaller_inputs.yml

The generated CWL files can also be found in the releases

Tests

Install the tests requirements, then run the tests. Note: docker must be installed in order to run the tests (the cwl files are tested during the tests):

install -r test_requirements.txt
st gatkcwlgenerator

You can also run the tests in parallel with -n to improve performance

Limitations:
Creating a new version

To create a gatk_cmdline_tools.zip zip file containing all the generated cwl files for gatk versions 3.5, 3.6, 3.7, 3.8 and 4.0.0.0, run bash build.sh. This file is uploaded as a release on GitHub for every new release of this package.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.