Name: TensorFlowOnYARN
Owner: Intel-bigdata
Description: Support TensorFlow on YARN
Created: 2017-03-13 06:02:01.0
Updated: 2018-05-23 08:24:20.0
Pushed: 2017-06-19 07:27:04.0
Size: 141
Language: Java
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
TensorFlow on YARN (TOY) is a toolkit to enable Hadoop users an easy way to run TensorFlow applications in distributed pattern and accomplish tasks including model management and serving inference.
Note that current project is a prototype with limitation and is still under development
Figure1. TOY Architecture
Prepare the build environment following the instructions from https://www.tensorflow.org/install/install_sources
Clone the TensorFlowOnYARN repository.
clone --recursive https://github.com/Intel-bigdata/TensorFlowOnYARN
Build the assembly.
ensorFlowOnYARN/tensorflow-parent
package -Pnative -Pdist
tensorflow-yarn-${VERSION}.tar.gz
and tensorflow-yarn-${VERSION}.zip
are built out
in the tensorflow-parent/tensorflow-yarn-dist/target
directory. Distribute the assembly
to the client node of a YARN cluster and extract.
Run the between-graph mnist example.
ensorflow-yarn-${VERSION}
ydl-tf launch --num_worker 2 --num_ps 2
This will launch a YARN application, which creates a tf.train.Server
instance for each task.
A ClusterSpec
is printed on the console such that you can submit the training script to. e.g.
terSpec: {"ps":["node1:22257","node2:22222"],"worker":["node3:22253","node2:22255"]}
ash
on examples/between-graph/mnist_feed.py \
s_hosts="ps0.hostname:ps0.port,ps1.hostname:ps1.port" \
orker_hosts="worker0.hostname:worker0.port,worker1.hostname:worker1.port" \
ask_index=0
on examples/between-graph/mnist_feed.py \
s_hosts="ps0.hostname:ps0.port,ps1.hostname:ps1.port" \
orker_hosts="worker0.hostname:worker0.port,worker1.hostname:worker1.port" \
ask_index=1
To get ClusterSpec of an existing TensorFlow cluster launched by a previous YARN application.
ydl-tf cluster --app_id <Application ID>
You may also use YARN commands through ydl-tf
.
For example, to get running application list,
ydl-tf application --list
or to kill an existing YARN application(TensorFlow cluster),
ydl-tf kill --application <Application ID>