Name: fusion-spark-xfer
Owner: Lucidworks
Description: Spark submit app for transfering data from one collection to another potentially across clusters
Created: 2018-01-19 15:47:46.0
Updated: 2018-05-09 21:15:19.0
Pushed: 2018-05-09 21:15:18.0
Homepage: null
Size: 14
Language: Scala
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
Spark submit app for transfering data from one collection to another potentially across clusters.
You'll need Fusion 3.1.3+
Configure the parameters to submit the job to Fusion's Spark master in the fusion-xfer.sh
script, such as:
in/bash
ON_HOME=/opt/fusion/3.1.3
K_MASTER=local[*]
JAR=/opt/fusion-spark-xfer/target/fusion-spark-xfer-1.0-shaded.jar
ION_HOME/apps/spark-dist/bin/spark-submit --master $SPARK_MASTER \
lass com.lucidworks.spark.CollectionTransferApp $APP_JAR \
estinationSolrClusterZk localhost:9983/lwfusion/3.1.3/solr \
estinationCollection dest_signals \
ourceSolrClusterZk localhost:9983/lwfusion/3.1.3/solr \
ourceCollection source_signals
Configure job resource allocation using standard Spark submit options, see: $FUSION_HOME/apps/spark-dist/bin/spark-submit --help
To get the active Spark master for Fusion, do: curl http://localhost:8765/api/v1/spark/master
CollectionTransferApp Options:
--batchSize <arg> Batch size for writing docs to the destination cluster; defaults to 10000
--destinationCollection <arg> Name of the Solr collection on the destination cluster to write data to; uses source name if not provided
--destinationSolrClusterZk <arg> ZooKeeper connection string for the Solr cluster this app transfers data to
--findNewOnly true|false Flag to indicate if this app should look for new docs in the source using the latest timestamp in the
destination; defaults to true, set to false to skip this check and pull all docs that match the source query
--sourceCollection <arg> Name of the Solr collection on the source cluster to read data from
--sourceQuery <arg> Query to source collection for docs to transfer; uses *:* if not provided
--sourceSolrClusterZk <arg> ZooKeeper connection string for the Solr cluster this app transfers data from
--sparkConf <arg> Additional Spark configuration properties file
--timestampField <arg> Timestamp field name on docs; defaults to 'timestamp_tdt'
--verbose Generate verbose log messages