hortonworks/dstream

Name: dstream

Owner: Hortonworks Inc

Description: null

Created: 2015-02-08 18:16:50.0

Updated: 2015-11-27 10:30:09.0

Pushed: 2015-10-06 23:14:37.0

Homepage: null

Size: 3797

Language: Java

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

DStream - Distributable Streams

==========

IMPORTANT: At the moment this is a research project with the primary goal of investigating the feasability of the approach.

The primary focus of the DStream API is to provide a Stream-based unified programming model to build ETL-style distributable data processes to be executed in compatible execution environments. While agnostic to any specific type of execution environment, the API aims to provide an extensible integration/delegation model to support variaty of execution environments.

The key distinction between Java 8 Stream and DStream is the notion of distributable data, which implies that the actual data may or may not be distributed, making DStream somewhat of a universal strategy to build ETL-style processes regardless of the location and/or the type of data as well as the execution environment it will be processed in.

The following code snippet shows an example of a quintessential WordCount:

re<Stream<Stream<Entry<String, Integer>>>> resultFuture = DStream.ofType(String.class, "wc")
.flatMap(line -> Stream.of(line.split("\\s+")))
.reduceValues(word -> word, word -> 1, Integer::sum)
xecuteAs("WordCount");

ach stream within a stream represents a partition essentially giving you access 
o each result partition
am<Stream<Entry<String, Integer>>> result = resultFuture.get();
lt.forEach(resultPartitionStream -> {
esultPartitionStream.forEach(System.out::println);

lt.close();

Producing output similar to this:


ot=2
ted=3
1
lems=2
=1
e=1
4
.

======

For features overviewo please follow Core Features Overview

To get started please follow Getting Started

=======


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.