Name: StatisticsOnSpark
Owner: intel-analytics
Description: Assembly of fundamental statistics implemented based on Apache Spark
Created: 2015-12-29 06:17:04.0
Updated: 2017-12-19 14:40:40.0
Pushed: 2016-02-11 16:00:25.0
Homepage: null
Size: 19
Language: Scala
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
Assembly of fundamental statistics implemented based on Apache Spark
This documentation is for Spark 1.3+. Other version will probably work yet not tested.
Spark.statistics
intends to provide fundamental statistics functions.
Currently we support:
Hopefully more features will come in quickly, next on the list:
val sample1 = Array(100d, 200d, 300d, 400d)
val sample2 = Array(101d, 205d, 300d, 400d)
val rdd1 = sc.parallelize(sample1)
val rdd2 = sc.parallelize(sample2)
new TwoSampleIndependentTTest().tTest(rdd1, rdd2, 0.05))
new TwoSampleIndependentTTest().tTest(rdd1, rdd2)