CD2H gitForager

futurice/dcontext

Name: dcontext

Owner: Futurice

Description: null

Forked from: VeikkausOy/dcontext

Created: 2017-06-12 13:19:12.0

Updated: 2017-06-12 13:19:14.0

Pushed: 2018-02-13 08:01:28.0

Homepage: null

Size: 122

Language: Scala

GitHub Committers

User	Most Recent Commit	# Commits

Other Committers

User	Email	Most Recent Commit	# Commits

README

Acknowledgements

This tool was created by Veikkaus (https://veikkaus.fi) and Futurice (https://futurice.com), and contributed to the open source community by Veikkaus with BSD-3 license.

dtesttoys and dcolossus dcontext ports were developed by Futurice (https://futurice.com)

The tool was authored by Antti Rauhala, while the dynamic code loading part is partly based on Li Jang's DynaCode example: http://www.javaworld.com/article/2071777/design-patterns/add-dynamic-java-code-to-your-application.html?page=2

Veikkaus dcontext

dcontext is a tool that can offer rapid iteration cycle with huge amounts of data and/or infrastructure. For example: dcontext can be used together with spark in order to keep the Spark infrastructure and data in the JVM's memory and the cluster's memory, while the code is modified and tested in rapid iteration cycle.

Example use:

Preparations

The 'examples' folder contains a project named fast-spark-test, which uses dcontext and Spark for analyzing moderately sized stock market data sets.

You can run this project by publishing testtoys from GitHub locally. Then running 'sbt publishLocal' local in both dcontext and ports/dtesttoys projects. Then you need to run ./get_data.sh and 'sbt testsh' in the fast-spark-test project folder.

Running dcontext console

'sbt testsh' launches the dcontext console with the example test suite and an example test case called spyDf. Executing the spyDf produces following results

st spyDf

.lots of Spark log..

lumns:
symbol
date
time
open
high
low
close
volume

ws: 1128951

 ms.

Running the test case takes several seconds, because the task also sets up the SparkContext and loads and parses about 50MB of data.

Modifying dynamically reloaded code

Now, let's launch a new terminal, and run 'sbt ~;package;test:package' on it, in order to compile fast-spark-test sources on the background.

Now, let's examine fast-spark-test sources codes. The 'spyDf' test is located in the ExampleTest.scala file and it looks like this:

("spyDf")((c, t) => {
 df : DataFrame = spyDf(c)
 count = df.count
ln("columns:")
columns.foreach { c => t.tln("  " + c) }
ln
ln("rows: " + count)

Let's test how dynamic class loading works, by adding three new lines for printing SPY dataframe contents:

("spyDf")((c, t) => {
 df : DataFrame = spyDf(c)
 count = df.count
ln("columns:")
columns.foreach { c => t.tln("  " + c) }
ln
ln("rows: " + count)
ln // new
ln("the data frame:") // new
(t, df) // prints first lines of the data frame

Then let's wait for a second for the sbt compilation process to finish on the background. After compilation is finished, we can relaunch the 'spyDf' test:

st spyDf
.lots of Spark log..
lumns:
symbol
date
time
open
high
low
close
volume

ws: 1128951

e data frame:
28951 entries
columns

mbol        |date          |time          |open          |high          |low           |close         |volume        
s of Spark log..
Y           |20040701      |0931          |114.25        |114.32        |114.24        |114.27        |216400.0      
Y           |20040701      |0932          |114.26        |114.33        |114.24        |114.31        |207200.0      
Y           |20040701      |0933          |114.3         |114.34        |114.28        |114.3         |83900.0       
Y           |20040701      |0934          |114.3         |114.32        |114.29        |114.32        |245500.0      
Y           |20040701      |0935          |114.29        |114.31        |114.29        |114.3         |69400.0       
Y           |20040701      |0936          |114.31        |114.34        |114.31        |114.32        |218200.0      
Y           |20040701      |0937          |114.33        |114.36        |114.32        |114.34        |59600.0       
Y           |20040701      |0938          |114.34        |114.34        |114.26        |114.28        |143300.0      
.

ms. 15 errors! [d]iff, [c]ontinue or [f]reeze?

The testtoys behavioral test suite notices, that the test case results have changed, according to our modications. Even more, the test was faster to run, because the SparkContext and the SPY dataframe remained loaded in the JVM's heap.

The Spark infrastructure and the various data frames can actually be seen by running '-l' in the dcontext console:


sparktest.sparkContext     org.apache.spark.SparkContext
sparktest.spy              org.apache.spark.rdd.MapPartitionsRDD
sparktest.spyDf            org.apache.spark.sql.DataFrame
sparktest.sqlContext       org.apache.spark.sql.SQLContext
                           fi.futurice.fastsparktest.ExampleTest

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.