NVIDIA/dl-inference-server

Name: dl-inference-server

Owner: NVIDIA Corporation

Description: Deep Learning Inference Server Clients

Created: 2018-04-09 18:21:27.0

Updated: 2018-05-23 18:07:22.0

Pushed: 2018-05-23 20:58:50.0

Homepage:

Size: 4737

Language: C++

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Deep Learning Inference Server Clients

The NVIDIA Inference Server provides a cloud inferencing solution optimized for NVIDIA GPUs. The server provides an inference service via an HTTP endpoint, allowing remote clients to request inferencing for any model being managed by the server.

This repo contains C++ and Python client libraries that make it easy to communicate with the inference server. Also included are C++ and Python versions of image_client, an example application that uses the C++ or Python client library to execute image classification models on the inference server.

The inference server itself is delivered as a containerized solution from the NVIDIA GPU Cloud. See the Inference Container User Guide for information on how to install and configure the inference server.

Branches

master: Active development branch. Typically will be compatible with the currently released NVIDIA Inference Server container, but not guaranteed.

yy.mm: Branch compatible with NVIDIA Inference Server yy.mm, for example 18.05.

Building the Clients

Before building the client libraries and applications you must first install some prerequisites. The following instructions assume Ubuntu 16.04. OpenCV is used by image_client to preprocess images before sending them to the inference server for inferencing. The python-pil package is required by the Python image_client example.

sudo apt-get update
sudo apt-get install build-essential libcurl3-dev libopencv-dev libopencv-core-dev python-pil software-properties-common

Protobuf3 support is required. For Ubuntu 16.04 this must be installed from a ppa, but if you are using a more recent distribution this step might not be necessary.

sudo add-apt-repository ppa:maarten-fonville/protobuf
sudo apt-get update
sudo apt-get install protobuf-compiler libprotobuf-dev

Creating the whl file for the Python client library requires setuptools.

pip install --no-cache-dir --upgrade setuptools

With those prerequisites installed, the C++ and Python client libraries and example image_client application can be built:

make -f Makefile.clients all pip

Build artifacts are in build/. The Python whl file is generated in build/dist/dist/ and can be installed with a command like the following:

pip install --no-cache-dir --upgrade build/dist/dist/inference_server-1.0.0-cp27-cp27mu-linux_x86_64.whl
Image Classification Example

The image classification example that uses the C++ client API is available at src/clients/image_classification/image_client.cc. After building, the executable is available at build/image_client. The python version of the image classification client is available at src/clients/python/image_client.py.

To use image_client (or image_client.py) you must first have an inference server that is serving one or more image classification models. The image_client example requires that the model have a single image input and produce a single classification output.

A simple TensorRT MNIST model is provided in the examples/models directory that we can use to demonstrate image_client. Following the instructions in the Inference Container User Guide, launch the inference server container pointing to that model store. For example:

nvidia-docker run --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p8000:8000 --mount type=bind,source=/path/to/dl-inference-server/examples/models,target=/tmp/models nvcr.io/nvidia/inferenceserver:18.05 /opt/inference_server/bin/inference_server --model-store=/tmp/models

Replace /path/to/dl-inference-server/examples/models with the corresponding path in your local clone of this repo. Once the server is running you can use image_client (or image_client.py) to send inference requests to the server.

image_client -m mnist_plan -s VGG examples/data/3.pgm

By default the client prints the most probable classification for the image.

Prediction totals:
        cnt=1   (3) three

Use the -c flag to see more classifications.

image_client -m mnist_plan -s VGG -c 3 examples/data/3.pgm
Output probabilities:
batch 0: 3 ("three") = 0.996513
batch 0: 5 ("five") = 0.00348471
batch 0: 4 ("four") = 2.07097e-06
Prediction totals:
        cnt=1   (3) three

The -b flag allows you to send a batch of images for inferencing. Currently image_client just sends the same image multiple times, so you will just see the same classification results repeated (as indicated by the 'cnt' value).

image_client -m mnist_plan -s VGG -b 2 examples/data/3.pgm
Prediction totals:
        cnt=2   (3) three
C++ API

The C++ client API exposes a class-based interface for querying server and model status and for performing inference. The commented interface is available at src/clients/common/request.h.

The following shows an example of the basic steps required for inferencing (error checking not included to improve clarity, see image_client.cc for full error checking):

reate the context object for inferencing using the latest version
f the 'mnist' model.
:unique_ptr<InferContext> ctx;
rContext::Create(&ctx, "localhost:8000", "mnist");

et handle to model input and output.
:shared_ptr<InferContext::Input> input;
>GetInput(input_name, &input);

:shared_ptr<InferContext::Output> output;
>GetOutput(output_name, &output);

et options so that subsequent inference runs are for a given batch_size
nd return a result for ?output?. The ?output? result is returned as a
lassification result of the ?k? most probable classes.
:unique_ptr<InferContext::Options> options;
rContext::Options::Create(&options);
ons->SetBatchSize(batch_size);
ons->AddClassResult(output, k);
>SetRunOptions(*options);

rovide input data for each batch.
t->Reset();
(size_t i = 0; i < batch_size; ++i) {
put->SetRaw(input_data[i]);


un inference and get the results. When the Run() call returns the ctx
an be used for another inference run. Results are owned by the caller
nd can be retained as long as necessary.
:vector<std::unique_ptr<InferContext::Result>> results;
>Run(&results);

or each entry in the batch print the top prediction.
(size_t i = 0; i < batch_size; ++i) {
ferContext::Result::ClassResult cls;
sults[0]->GetClassAtCursor(i, &cls);
d::cout << "batch " << i << ": " << cls.label << std::endl;

Python API

The Python client API provides similar capabilities as the C++ API. The commented interface for StatusContext and InferContext classes is available at src/clients/python/__init__.py.

The following shows an example of the basic steps required for inferencing (error checking not included to improve clarity):

 inference_server.api import *

eate input with random data
t_list = list()
b in range(batch_size):
in = np.random.randint(size=input_size, dtype=input_dtype)
input_list.append(in)

n inferencing and get the top-3 classes
= InferContext("localhost:8000", "mnist")
lts = ctx.run(
{ "data" : input_list },
{ "prob" : (InferContext.ResultFormat.CLASS, 3) },
batch_size)


int results
(result_name, result_val) in iteritems(results):
for b in range(batch_size):
    print("output {}, batch {}: {}".format(result_name, b, result_val[b]))

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.