intel-analytics/analytics-zoo

Name: analytics-zoo

Owner: intel-analytics

Description: Analytics + AI Platform for Apache Spark and BigDL

Created: 2017-05-05 02:27:30.0

Updated: 2018-05-24 11:45:36.0

Pushed: 2018-05-24 11:45:31.0

Homepage:

Size: 79205

Language: Jupyter Notebook

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Analytics Zoo

Analytics + AI Platform for Apache Spark and BigDL

Analytics Zoo makes it easy to build deep learning application on Spark and BigDL, by providing an end-to-end analytics + AI Platform (including high level pipeline APIs, built-in deep learning models, reference use cases, etc.).

High level pipeline APIs

Analytics Zoo provides a set of easy-to-use, high level pipeline APIs that natively support Spark DataFrames and ML Pipelines, autograd and custom layer/loss, trasnfer learning, etc.

nnframes

nnframes provides native deep learning support in Spark DataFrames and ML Pipelines, so that you can easily build complex deep learning pipelines in just a few lines, as illustracted below.

  1. Load images into DataFrames using NNImageReader

     zoo.common.nncontext import *
     zoo.pipeline.nnframes import *
     get_nncontext()
    eDF = NNImageReader.readImages(image_path, sc)
    
  2. Process loaded data using DataFrames transformations

    ame = udf(lambda row: ...)
    abel = udf(lambda name: ...)
     imageDF.withColumn("name", getName(col("image"))).withColumn("label", getLabel(col('name')))
    
  3. Processing image using built-in feature engineering operations

     zoo.feature.image import *
    sformer = RowToImageFeature() -> ImageResize(64, 64) -> ImageChannelNormalize(123.0, 117.0, 104.0) \
             -> ImageMatToTensor() -> ImageFeatureToTensor())
    
  4. Define model using Keras-style APIs

     zoo.pipeline.api.keras.layers import *
     zoo.pipeline.api.keras.models import *
    l = Sequential().add(Convolution2D(32, 3, 3, activation='relu', input_shape=(1, 28, 28))) \
               .add(MaxPooling2D(pool_size=(2, 2))).add(Flatten()).add(Dense(10, activation='softmax')))
    
  5. Train model using Spark ML Pipelines

    sifier = NNClassifier(model, CrossEntropyCriterion(),transformer).setLearningRate(0.003) \
               .setBatchSize(40).setMaxEpoch(1).setFeaturesCol("image").setCachingSample(False)
    del = classifier.fit(df)
    
autograd

autograd provides automatic differentiation for math operations, so that you can easily build your own custom loss and layer (in both Python and Scala), as illustracted below.

  1. Define custom functions using autograd

     zoo.pipeline.api.autograd import *
    
    mean_absolute_error(y_true, y_pred):
    eturn mean(abs(y_true - y_pred), axis=1)
    
    add_one_func(x):
    eturn x + 1.0
    
  2. Define model using Keras-style API and custom Lambda layer

     zoo.pipeline.api.keras.layers import *
     zoo.pipeline.api.keras.models import *
    l = Sequential().add(Dense(1, input_shape=(2,)))
                   .add(Lambda(function=add_one_func))
    
  3. Train model with custom loss function

    l.compile(optimizer = SGD(), loss = mean_absolute_error)
    l.fit(x = ..., y = ...)
    
Transfer learning

Using the high level transfer learning APIs, you can easily customize pretrained models for feature extraction or fine-tuning.

  1. Load an existing model (pretrained in Caffe)

     zoo.pipeline.api.net import *
    _model = Net.load_caffe(model_path)
    
  2. Remove last few layers

    eate a new model by remove layers after pool5/drop_7x7_s1
    l = full_model.new_graph(["pool5/drop_7x7_s1"])
    
  3. Freeze first few layers

    eeze layers from input to pool4/3x3_s2 inclusive
    l.freeze_up_to(["pool4/3x3_s2"])
    
  4. Add a few new layers

     zoo.pipeline.api.keras.layers import *
     zoo.pipeline.api.keras.models import *
    t = Input(name="input", shape=(3, 224, 224))
    ption = model.to_keras()(input)
    ten = Flatten()(inception)
    ts = Dense(2)(flatten)
    odel = Model(inputNode, logits)
    
Built-in deep learning models

Analytics Zoo provides several built-in deep learning models that you can use for a variety of problem types, such as object detection, image classification, text classification, recommendation, etc.

Object detection API

Using Analytics Zoo Object Detection API (including a set of pretrained detection models such as SSD and Faster-RCNN), you can easily build your object detection applications (e.g., localizing and identifying multiple objects in images and videos), as illustrated below.

  1. Download object detection models in Analytics Zoo

    You can download a collection of detection models (pretrained on the PSCAL VOC dataset and COCO dataset) from detection model zoo.

  2. Use Zoo Object Detection API for off-the-shell inference

     zoo.models.image.objectdetection import *
    l = ObjectDetector.load_model(model_path)
    e_set = ImageSet.read(img_path, sc)
    ut = model.predict_image_set(image_set)
    
Image classification API

Using Analytics Zoo Image Classification API (including a set of pretrained detection models such as VGG, Inception, ResNet, MobileNet, etc.), you can easily build your image classification applications, as illustrated below.

  1. Download image classification models in Analytics Zoo

    You can download a collection of image classification models (pretrained on the ImageNet dataset) from image classification model zoo.

  2. Use Image classification API for off-the-shell inference

     zoo.models.image.imageclassification import *
    l = ImageClassifier.load_model(model_path)
    e_set = ImageSet.read(img_path, sc)
    ut = model.predict_image_set(image_set)
    
Text classification API

Analytics Zoo Text Classification API provides a set of pre-defined models (using CNN, LSTM, etc.) for text classifications.

Recommendation API

Analytics Zoo Recommendation API provides a set of pre-defined models (such as Neural Collaborative Filtering, Wide and Deep Learning, etc.) for receommendations.

Reference use cases

Analytics Zoo provides a collection of end-to-end reference use cases, including anomaly detection (for time series data), sentiment analysis, fraud detection, image augmentation, object detection, variational autoencoder, etc.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.