Name: db2-event-store-clickstream
Owner: International Business Machines
Description: Sample notebooks demonstrate a use case of clickstream analysis with IBM Db2 Event Store using Scala APIs to ingest and analyze web event data.
Created: 2018-05-04 16:17:49.0
Updated: 2018-05-23 22:07:00.0
Pushed: 2018-05-23 22:07:01.0
Size: 771
Language: Jupyter Notebook
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
IBM Db2 Event Store offers high-speed ingestion and real-time analytics for large volumes of streaming data. The platform enables event-driven applications to persist event data at scale and powers high performance Spark analytics on all data for quick insights. In this Code Pattern, we will see how a retail business uses IBM Db2 Event Store to capture and analyze clickstream data from its web channels. The clickstream analysis helps the business to closely track customer browsing patterns and better understand their changing interests. Acting on these insights, the business offers a personalized experience for every customer with targeted offers to drive sales.
Sample notebooks demonstrate the use case of clickstream analysis with IBM Db2 Event Store using Scala APIs to ingest and analyze web event data. Credit goes to Siva Anne of the IBM Data Science Elite Team for the original Jupyter Notebooks.
When the reader has completed this code pattern, they will understand how to:
Install IBM® Db2® Event Store Developer Edition on Mac, Linux, or Windows by following the instructions here.
Note: This code pattern was developed with EventStore-DeveloperEdition 1.1.4
Clone the db2-event-store-clickstream
locally. In a terminal, run:
clone https://github.com/IBM/db2-event-store-clickstream
Use the Db2 Event Store UI to add the CSV input file as a data asset.
From the drop down menu (three horizontal lines in the upper left corner), select My Notebooks
.
Click on add data assets
.
Click browse
and navigate to the data
directory in your cloned repo. Select the file clickstream_data.csv
.
Use the Db2 Event Store UI to create the notebook.
From the drop down menu (three horizontal lines in the upper left corner), select My Notebooks
.
Click on add notebooks
.
Select the From File
tab.
Provide a name.
Click Choose File
and navigate to the notebooks
directory in your cloned repo. Select the file ingest_clickstream_events.ipynb
.
Scroll down and click on Create Notebook
.
Edit the HOST
constant in the first code cell. You will need to enter your host's IP address in place of the XXX.XXX.XXX.XXX
value.
Run the notebook using the menu Cell > Run all
or run the cells individually with the play button.
This notebook demonstrates how to:
Use the Db2 Event Store UI to create the notebook.
analyze_clickstream_events.ipynb
from your repo's notebooks
directory.Edit the HOST
constant in the first code cell. You will need to enter your host's IP address in place of the XXX.XXX.XXX.XXX
value.
Run the notebook using the menu Cell > Run all
or run the cells individually with the play button.
This notebook demonstrates how to:
Code cells that prepare DataFrames with calculated and aggregated fields include show() output to give you a peek at the data as it is being processed.
The first Brunel charts use aggregated web metris for product lines. Here we show 4 charts to help you compare page views with time spent on web pages.
The bar charts use the same order and color for product lines (sorted by page hits). The charts are placed with one directly below the other so that your eyes will easily spot where they differ.
The charts show that smart phones
web pages are the most popular in both page views and time spent on pages.
videogames
stands out as a product line with significantly higher total time
relative to its page hits
.
Notice the tooltips when you hover over the bars.
Click on the videogames
bar.
The next Brunel charts show aggregated web metris for products in the smart phones
product line. Here we show 4 charts similar to those described above.
These charts show that the A-phone
is the leading smart phone product in terms of both page hits and time spent on a page.
Notice that the X-phone
stands out as the phone with higher time spent on web pages per page view.
Next we look at specific features of the A-phone
.
Here we use a bar chart to show page views by feature and a pie chart to show time spent on pages.
Clicking on a bar will highlight the same feature in the pie chart.
The tool tips show additional information when hovering over bars or pie slices.
color
was the most important feature for both page views and time spent on web pages.
Finally, after more data manipulation, we look into web metrics for a specific user.
This view could be used by a support agent or a targeted offering campaign to analyze a user's current interests.
A legend is displayed on the right. Color is by product line.
The bar chart shows the user's page views over the past seven days. A stacked bar is used to show each product line viewed.
Clicking on a bar will highlight the pie chart slices for that day and that product line.
See the notebook with example output and interactive charts here.