Name: spark-tests
Owner: Hammer Lab
Description: Utilities for writing tests that use Apache Spark.
Created: 2016-11-13 17:28:38.0
Updated: 2017-11-30 18:39:36.0
Pushed: 2018-01-13 21:06:07.0
Homepage: null
Size: 89
Language: Scala
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
Utilities for writing tests that use Apache Spark.
SparkSuite
: a SparkContext
for each test suiteAdd configuration options in subclasses using sparkConf(?)
, cf. KryoSparkSuite
:
kConf(
Register this class as its own KryoRegistrator
park.kryo.registrator" ? getClass.getCanonicalName,
park.serializer" ? "org.apache.spark.serializer.KryoSerializer",
park.kryo.referenceTracking" ? referenceTracking.toString,
park.kryo.registrationRequired" ? registrationRequired.toString
PerCaseSuite
: SparkContext
for each test caseKryoSparkSuite
SparkSuite
implementation that provides hooks for kryo-registration:
ster(
assOf[Foo],
rg.foo.Bar",
assOf[Bar] ? new BarSerializer
Also useful for subclassing once per-project and filling in that project's default Kryo registrar, then having concrete tests subclass that; see cf. hammerlab/guacamole and hammerlab/pageant for examples.
rdd.Util
: make an RDD with specific elements in specific partitions.NumJobsUtil
: verify the number of Spark jobs that have been run.RDDSerialization
: interface that allows for verifying that performing a serialization+deserialization round-trip on an RDD results in the same RDD.