hammerlab/dataproc

Name: dataproc

Owner: Hammer Lab

Description: Simple python script for running a Spark job on an ephemeral Google Cloud dataproc cluster

Created: 2017-03-19 05:15:22.0

Updated: 2017-03-19 05:22:24.0

Pushed: 2018-01-13 21:03:11.0

Homepage: null

Size: 10

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

dataproc

Simple python script for running a Spark job on an ephemeral Google Cloud dataproc cluster.

dataproc -h
e: dataproc [-h] [--cluster CLUSTER] [--timestamp-cluster-name]
            [--cores CORES] [--properties PROPS_FILES] [--jar JAR]
            [--main MAIN] [--machine-type MACHINE_TYPE] [--dry-run]

a Spark job on an ephemeral dataproc cluster

onal arguments:
, --help            show this help message and exit
cluster CLUSTER     Name of the dataproc cluster to use; defaults to
                    $CLUSTER env var
timestamp-cluster-name, -t
                    When true, append "-<TIMESTAMP>" to the dataproc
                    cluster name
cores CORES         Number of CPU cores to use (default: 200)
properties PROPS_FILES, -p PROPS_FILES
                    Comma-separated list of Spark properties files; merged
                    with $SPARK_PROPS_FILES env var
jar JAR             URI of main app JAR; defaults to JAR env var
main MAIN, -m MAIN  JAR main class; defaults to MAIN env var
machine-type MACHINE_TYPE
                    Machine type to use (default: n1-standard-4)
dry-run, -n         When set, print some of the parsed and inferred
                    arguments and exit without running any dataproc
                    commands

See hammerlab/pageant scripts/run-on-gcloud for an example use that simply sets MAIN, CLUSTER, and JAR env vars and delegates to this script.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.