CD2H gitForager

GoogleCloudPlatform/dataproc-pubsub-spark-streaming

Name: dataproc-pubsub-spark-streaming

Owner: Google Cloud Platform

Description: null

Created: 2018-05-09 15:35:35.0

Updated: 2018-05-21 18:31:50.0

Pushed: 2018-05-21 18:31:49.0

Homepage: null

Size: 30

Language: Scala

GitHub Committers

User	Most Recent Commit	# Commits

Other Committers

User	Email	Most Recent Commit	# Commits

README

In this tutorial you learn how to deploy an Apache Spark streaming application on Cloud Dataproc and process messages from Cloud Pub/Sub in near real-time. The system you build in this scenario generates thousands of random tweets, identifies trending hashtags over a sliding window, saves results in Cloud Datastore, and displays the results on a web page.

Please refer to the related article for all the steps to follow in this tutorial: [INSERT LINK WHEN PUBLISHED]

Contents of this repository:

http_function: Javascript code for the HTTP function deployed on Cloud Functions.
spark: Scala code for the Apache Spark streaming application.
tweet-generator: Python code for the randomized tweet generator.

Running the tests

To run the tests:

park
test

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.