yahoo/TensorFlowOnSpark

Name: TensorFlowOnSpark

Owner: Yahoo Inc.

Description: TensorFlowOnSpark brings TensorFlow programs onto Apache Spark clusters

Created: 2017-01-20 18:15:57.0

Updated: 2018-01-18 06:48:30.0

Pushed: 2018-01-12 19:08:30.0

Homepage:

Size: 1443

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

TensorFlowOnSpark

What's TensorFlowOnSpark?

TensorFlowOnSpark brings scalable deep learning to Apache Hadoop and Apache Spark clusters. By combining salient features from deep learning framework TensorFlow and big-data frameworks Apache Spark and Apache Hadoop, TensorFlowOnSpark enables distributed deep learning on a cluster of GPU and CPU servers.

TensorFlowOnSpark enables distributed TensorFlow training and inference on Apache Spark clusters. It seeks to minimize the amount of code changes required to run existing TensorFlow programs on a shared grid. Its Spark-compatible API helps manage the TensorFlow cluster with the following steps:

  1. Startup - launches the Tensorflow main function on the executors, along with listeners for data/control messages.
  2. Data ingestion
  3. Readers & QueueRunners - leverages TensorFlow's Reader mechanism to read data files directly from HDFS.
  4. Feeding - sends Spark RDD data into the TensorFlow nodes using the feed_dict mechanism. Note that we leverage the Hadoop Input/Output Format for access to TFRecords on HDFS.
  5. Shutdown - shuts down the Tensorflow workers and PS nodes on the executors.

TensorFlowOnSpark was developed by Yahoo for large-scale distributed deep learning on our Hadoop clusters in Yahoo's private cloud.

Why TensorFlowOnSpark?

TensorFlowOnSpark provides some important benefits (see our blog) over alternative deep learning solutions.

Using TensorFlowOnSpark

Please check TensorFlowOnSpark wiki site for detailed documentations such as getting started guides for YARN cluster and AWS EC2 cluster. A Conversion Guide has been provided to help you convert your TensorFlow programs.

Mailing List

Please join TensorFlowOnSpark user group for discussions and questions.

License

The use and distribution terms for this software are covered by the Apache 2.0 license. See LICENSE file for terms.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.