Name: online-inferencing-blog-application
Owner: Confluent Inc.
Description: Source code and application accompanying the online inferencing blog
Created: 2017-08-31 19:53:07.0
Updated: 2018-05-18 17:15:04.0
Pushed: 2017-11-14 15:11:55.0
Homepage: null
Size: 6902
Language: Java
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
This Kafka Streams application demonstrates using embedded ML library Apache Mahout to perform OnlineLogisticRegression of flight data from the Bureau of Transportation Statistics
Specifically this application aims to do two things:
Demonstrate the ability to perform online inferencing by joining a KStream with a GlobalKTable (airport id is the key) containing coefficients/model that can be used to predict the if a flight will arrive on time or not, by making an prediction with the flight data in the record.
Update the model by a separate stream (Processor API) that collects flight data and when
enough data is collected retrain a model and publish the updated coefficients to the Kafka topic
backing the GlobalKTable, ensuring up to date predictions and keeping the model up to date
in a streaming manner and hopefully improve our
Initially we'll observe a poor prediction rate, around 50%, basically a coin flip. But as we collect more data we are able
to build a better model and publish the new updated model to the GlobalKTable, resulting in much better prediction rates
somewhere between 80-90%.
Again the point of this application is not about machine learning algorithms per-se or how to build better machine-learning models, but that we can leverage the GlobalKTable to publish and updated model/coefficients and improve our on-line inferencing in steaming manner without having to do a batch job.
This project uses Gradle and after cloning/downloading it is recommended to first run the gradle
command.
It is assumed that a Kafka instance already installed and running.
To run this application
onlineRegression-by-airport
, raw-airline-data
, ml-data-input
, predictions
../gradlew populateGlobalKTable
from terminal window.KStreamsOnLinePredictions
application with ./gradlew runOnlinePredictions
./gradlew runDataFeed