Name: db2-event-store-taxi-trips
Owner: International Business Machines
Description: Stream data from a Java program and use a Jupyter notebook to demonstrate charting of statistics based on historical and live events. IBM Db2 Event Store is used as the event database.
Created: 2018-03-08 17:33:02.0
Updated: 2018-04-22 14:59:56.0
Pushed: 2018-04-22 14:59:55.0
Homepage: https://developer.ibm.com/code/patterns/ingest-and-analyze-event-data-streams-for-timely-insights/
Size: 7951
Language: Java
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
In this code pattern, we will stream data from a Java program and use a Jupyter notebook to demonstrate charting of statistics based on historical and live events. IBM Db2 Event Store is used as the event database.
IBM® Db2 Event Store (formerly IBM Project EventStore) is an in-memory database designed for massive structured data volumes and real-time analytics built on Apache SPARK and Apache Parquet Data Format. The solution is optimized for event-driven data processing and analysis. It can support emerging applications that are driven by events such as IoT solutions, payments, logistics and web commerce. It is flexible, scalable and can adapt quickly to your changing business needs over time. Available in a free developer edition and an enterprise edition that you can download now. The enterprise edition is free for pre-production and test.
Credit goes to Jacques Roy for the original Java code and Jupyter notebook.
When the reader has completed this code pattern, they will understand how to:
Install IBM® Db2® Event Store Developer Edition on Mac, Linux, or Windows by following the instructions here.
Note: This code pattern was developed with EventStore-DeveloperEdition 1.1.4
Clone the db2-event-store-taxi-trips
locally. In a terminal, run:
clone https://github.com/IBM/db2-event-store-taxi-trips
Maven >= 3.5 is used to build, test, and run. Check your maven version using the following command:
-v
To download and install maven, refer to maven.
Use maven to download the dependencies with the following commands:
b2-event-store-taxi-trips
clean
install
The event loader runs as a daemon and waits for the notebook to tell it to start and stop the event stream.
The args string contains "port host user password"
matching the settings in the notebook.
compile exec:java -Dexec.mainClass=com.ibm.developer.code.patterns.db2eventstoretaxitrips.StartLoader -Dexec.args="9292 0.0.0.0 admin password"
Note:
mvn compile
can be done separately, but including it beforeexec
gives you a recompile as needed if the code has changed.
Use CTRL-C
to kill the event loader daemon when you are done with it.
Note: Db2 Event Store is built with Data Science Experience (DSX) Local
The git repo includes a Jupyter notebook which demonstrates interacting with Db2 Event Store with Spark SQL and matplotlib.
The notebook also demonstrates basics such as:
Use the Db2 Event Store / DSX Local UI to create and run the notebook.
My Notebooks
.add notebooks
.From File
tab.Choose File
and navigate to the notebooks
directory in your cloned repo. Select the file taxi_trips.ipynb
.Create Notebook
.
The new notebook is now open and ready for execution.HOST
constant in the first code cell. You will need to enter your host's IP address here.Cell > Run all
or run the cells individually with the play button.The Java daemon waits until the notebook tells it to start loading events. When you run the notebook code after 3.3 Start the insertion program the cell output will say Insert process started
and the Java program output will begin (or restart) to print messages like these:
er of records inserted: 400, total time: 286ms
er of records inserted: 800, total time: 348ms
er of records inserted: 1200, total time: 391ms
er of records inserted: 1600, total time: 434ms
er of records inserted: 2000, total time: 473ms
The provided JSON file has 50,000 events. The loader continues until it runs out of events or until the final cell of the notebook signals for it to stop.
The first query to try is a simple count(*)
query. Using the show()
function you will see that you have successfully inserted and queried events. You can run this cell over and over to see the count increase.
The next query uses a GROUP BY to aggregate by time. With this query you will see the counts and average stats for each 15 minute interval. By running this repeatedly (the notebook includes a short query loop), you will see that as the events come in the latest time interval has a growing count and changing averages.
Using the same aggregation query inside an animated matplotlib loop, we can watch as the time slices fill in with events. In this simple example, the last time slice shows a changing count and average as it responds to the events as they come in.