h2oai/qcon2015

Name: qcon2015

Owner: H2O.ai

Description: Repository for SF QConf 2015 Workshop

Created: 2015-11-10 20:10:04.0

Updated: 2018-04-23 13:46:27.0

Pushed: 2015-11-20 22:21:32.0

Homepage: null

Size: 36208

Language: Java

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

QCon 2015

Materials for the San Francisco QConf 2015 Workshop. The goal for the day is to learn to use Spark, H2O and Sparkling Water to build smart applications driven by machine learning models. The tutorials will go over:

Outline
  1. Spark & Sparkling Water Introduction
    • H2O and Spark intro
    • Sparkling Water intro
    • Installation and setup of Spark
    • Running Spark shell
    • Installation and setup of Sparkling Water
    • Basic architecture and overview of functionalities
    • Hands on demonstration of Sparkling Water
    • Running Sparkling Shell
  2. Simple Spam Detector
    • Use Spark to tokenize text
    • Use MLlib's TF-IDF model to transform the data into a table
    • Build GBM model to label incoming text as spam or not spam (ham)
  3. Ask Craig(list) Application
    • Build a classifier to label job description into appropriate industry categories
    • Deploy it as Spark application
  4. Standalone application concepts
    • Deploy the classification model inside Spark Streaming
  5. Spark Streaming and Model Deployment
    • Loading a saved H2O binary model
    • Exposing the model via Spark stream
  6. Spark Streaming and Model Deployment #2
    • Using exported POJO model in Spark stream
  7. Final Application
    • Assembling the final application: combining the front end and back end
  8. Lending Club Example
    • A smart app predicting loan interest
    • Off-line training pipeline driven from R
    • POJO models exposed via REST API
Requirements
Goals

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.