Name: dataproc
Owner: Hammer Lab
Description: Simple python script for running a Spark job on an ephemeral Google Cloud dataproc cluster
Created: 2017-03-19 05:15:22.0
Updated: 2017-03-19 05:22:24.0
Pushed: 2018-01-13 21:03:11.0
Homepage: null
Size: 10
Language: Python
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
Simple python script for running a Spark job on an ephemeral Google Cloud dataproc cluster.
dataproc -h
e: dataproc [-h] [--cluster CLUSTER] [--timestamp-cluster-name]
[--cores CORES] [--properties PROPS_FILES] [--jar JAR]
[--main MAIN] [--machine-type MACHINE_TYPE] [--dry-run]
a Spark job on an ephemeral dataproc cluster
onal arguments:
, --help show this help message and exit
cluster CLUSTER Name of the dataproc cluster to use; defaults to
$CLUSTER env var
timestamp-cluster-name, -t
When true, append "-<TIMESTAMP>" to the dataproc
cluster name
cores CORES Number of CPU cores to use (default: 200)
properties PROPS_FILES, -p PROPS_FILES
Comma-separated list of Spark properties files; merged
with $SPARK_PROPS_FILES env var
jar JAR URI of main app JAR; defaults to JAR env var
main MAIN, -m MAIN JAR main class; defaults to MAIN env var
machine-type MACHINE_TYPE
Machine type to use (default: n1-standard-4)
dry-run, -n When set, print some of the parsed and inferred
arguments and exit without running any dataproc
commands
See hammerlab/pageant scripts/run-on-gcloud for an example use that simply sets MAIN
, CLUSTER
, and JAR
env vars and delegates to this script.