IBM/db2-event-store-taxi-trips

Name: db2-event-store-taxi-trips

Owner: International Business Machines

Description: Stream data from a Java program and use a Jupyter notebook to demonstrate charting of statistics based on historical and live events. IBM Db2 Event Store is used as the event database.

Created: 2018-03-08 17:33:02.0

Updated: 2018-04-22 14:59:56.0

Pushed: 2018-04-22 14:59:55.0

Homepage: https://developer.ibm.com/code/patterns/ingest-and-analyze-event-data-streams-for-timely-insights/

Size: 7951

Language: Java

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Build Status

Analyze event streams with taxi cab data

In this code pattern, we will stream data from a Java program and use a Jupyter notebook to demonstrate charting of statistics based on historical and live events. IBM Db2 Event Store is used as the event database.

IBM® Db2 Event Store (formerly IBM Project EventStore) is an in-memory database designed for massive structured data volumes and real-time analytics built on Apache SPARK and Apache Parquet Data Format. The solution is optimized for event-driven data processing and analysis. It can support emerging applications that are driven by events such as IoT solutions, payments, logistics and web commerce. It is flexible, scalable and can adapt quickly to your changing business needs over time. Available in a free developer edition and an enterprise edition that you can download now. The enterprise edition is free for pre-production and test.

Credit goes to Jacques Roy for the original Java code and Jupyter notebook.

When the reader has completed this code pattern, they will understand how to:

Flow
  1. User runs Jupyter notebook in DSX Local
  2. Notebook connects to Db2 Event Store to analyze live event stream
  3. External Java program sends live events
Included components
Featured technologies

Watch the Video

Steps

Run locally
  1. Install IBM Db2 Event Store Developer Edition
  2. Clone the repo
  3. Build and run the Java event loader
  4. Create the Jupyter notebook in DSX Local
  5. Run the notebook
1. Install IBM Db2 Event Store Developer Edition

Install IBM® Db2® Event Store Developer Edition on Mac, Linux, or Windows by following the instructions here.

Note: This code pattern was developed with EventStore-DeveloperEdition 1.1.4

2. Clone the repo

Clone the db2-event-store-taxi-trips locally. In a terminal, run:

clone https://github.com/IBM/db2-event-store-taxi-trips
3. Build and run the Java event loader
Pre-requisite

Maven >= 3.5 is used to build, test, and run. Check your maven version using the following command:

-v

To download and install maven, refer to maven.

Download dependencies

Use maven to download the dependencies with the following commands:

b2-event-store-taxi-trips
clean
install
Compile and run the event loader daemon

The event loader runs as a daemon and waits for the notebook to tell it to start and stop the event stream.

The args string contains "port host user password" matching the settings in the notebook.

compile exec:java -Dexec.mainClass=com.ibm.developer.code.patterns.db2eventstoretaxitrips.StartLoader -Dexec.args="9292 0.0.0.0 admin password"

Note: mvn compile can be done separately, but including it before exec gives you a recompile as needed if the code has changed.

Killing the daemon

Use CTRL-C to kill the event loader daemon when you are done with it.

4. Create the Jupyter notebook

Note: Db2 Event Store is built with Data Science Experience (DSX) Local

The git repo includes a Jupyter notebook which demonstrates interacting with Db2 Event Store with Spark SQL and matplotlib.

The notebook also demonstrates basics such as:

Importing the Notebook

Use the Db2 Event Store / DSX Local UI to create and run the notebook.

  1. From the drop down menu (three horizontal lines in the upper left corner), select My Notebooks.
  2. Click on add notebooks.
  3. Select the From File tab.
  4. Provide a name.
  5. Click Choose File and navigate to the notebooks directory in your cloned repo. Select the file taxi_trips.ipynb.
  6. Scroll down and click on Create Notebook. The new notebook is now open and ready for execution.
5. Run the notebook
  1. Edit the HOST constant in the first code cell. You will need to enter your host's IP address here.
  2. Run the notebook using the menu Cell > Run all or run the cells individually with the play button.

Sample output

Java event loader

The Java daemon waits until the notebook tells it to start loading events. When you run the notebook code after 3.3 Start the insertion program the cell output will say Insert process started and the Java program output will begin (or restart) to print messages like these:

er of records inserted: 400, total time: 286ms
er of records inserted: 800, total time: 348ms
er of records inserted: 1200, total time: 391ms
er of records inserted: 1600, total time: 434ms
er of records inserted: 2000, total time: 473ms

The provided JSON file has 50,000 events. The loader continues until it runs out of events or until the final cell of the notebook signals for it to stop.

Run SQL queries
Count(*)

The first query to try is a simple count(*) query. Using the show() function you will see that you have successfully inserted and queried events. You can run this cell over and over to see the count increase.

Group by time

The next query uses a GROUP BY to aggregate by time. With this query you will see the counts and average stats for each 15 minute interval. By running this repeatedly (the notebook includes a short query loop), you will see that as the events come in the latest time interval has a growing count and changing averages.

Animated charting

Using the same aggregation query inside an animated matplotlib loop, we can watch as the time slices fill in with events. In this simple example, the last time slice shows a changing count and average as it responds to the events as they come in.

Links

Learn more

License

Apache 2.0


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.