Name: dstream
Owner: Hortonworks Inc
Description: null
Created: 2015-02-08 18:16:50.0
Updated: 2015-11-27 10:30:09.0
Pushed: 2015-10-06 23:14:37.0
Homepage: null
Size: 3797
Language: Java
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
==========
IMPORTANT: At the moment this is a research project with the primary goal of investigating the feasability of the approach.
The primary focus of the DStream API is to provide a Stream-based unified programming model to build ETL-style distributable data processes to be executed in compatible execution environments. While agnostic to any specific type of execution environment, the API aims to provide an extensible integration/delegation model to support variaty of execution environments.
The key distinction between Java 8 Stream and DStream is the notion of distributable data, which implies that the actual data may or may not be distributed, making DStream somewhat of a universal strategy to build ETL-style processes regardless of the location and/or the type of data as well as the execution environment it will be processed in.
The following code snippet shows an example of a quintessential WordCount:
re<Stream<Stream<Entry<String, Integer>>>> resultFuture = DStream.ofType(String.class, "wc")
.flatMap(line -> Stream.of(line.split("\\s+")))
.reduceValues(word -> word, word -> 1, Integer::sum)
xecuteAs("WordCount");
ach stream within a stream represents a partition essentially giving you access
o each result partition
am<Stream<Entry<String, Integer>>> result = resultFuture.get();
lt.forEach(resultPartitionStream -> {
esultPartitionStream.forEach(System.out::println);
lt.close();
Producing output similar to this:
ot=2
ted=3
1
lems=2
=1
e=1
4
.
======
For features overviewo please follow Core Features Overview
To get started please follow Getting Started
=======