Name: detect-timeseriesdata-change
Owner: International Business Machines
Description: This repository contains instructions for data retrieval and statistical analysis using R - Jupyter notebook to analyze and detect change-points in IoT sensor data. It also covers the data acquisition and storage of sensor data in database using node-red.
Created: 2017-07-31 16:37:17.0
Updated: 2018-04-27 18:36:26.0
Pushed: 2018-04-27 18:39:24.0
Homepage: https://developer.ibm.com/code/patterns/detect-change-points-in-iot-sensor-data
Size: 4252
Language: Jupyter Notebook
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
Data Science Experience is now Watson Studio. Although some images in this code pattern may show the service as Data Science Experience, the steps and processes will still work.
This code pattern is intended for any developer who wants to experiment, learn, enhance and implement a new method for Statistically detecting Change point in Sensor data. Sensors mounted on devices like IoT devices, Automated manufacturing like Robot arms, Process monitoring and Control equipment etc., collect and transmit data on a continuous basis which is Time stamped.
This code pattern takes you through end to end flow of steps in collating statistics on such Time series data and identify if a Change point has occurred. Core building blocks would include computing Statistical parameters from the Time series data, which compares a Previous dataset of a certain Time range in the past with the Current Series in a recent Time range. Statistical comparison between these two results in detection of any change points. R statistical software is used in this pattern with sample Sensor data loaded into the Data Science experience cloud.
All the intermediary steps are modularized and all code open sourced to enable developers to use / modify the modules / sub-modules as they see fit for their specific application.
This code pattern utilizes IoT sensor data and its primary goal is to statistically identify the change point in this sensor data rather than the acquisition and storage of the data itself. For sake of completeness of the flow, a simulation of the IoT data acquisition is included as a first step.
A detailed pattern of acquisition and storage of IoT sensor data is already covered extensively elsewhere. References to the details of these patterns are also given.
When you have completed this code pattern, you will understand how to
This code pattern can be logically split into 2 major parts:
You will need the following accounts and tools:
Developer can reuse all components that support the above steps like
IBM Node-RED Cloud Foundry App: Develop, deploy, and scale server-side JavaScript® apps with ease. The IBM SDK for Node.js? provides enhanced performance, security, and serviceability.
IBM Watson Studio: Analyze data using RStudio, Jupyter, and Python in a configured, collaborative environment that includes IBM value-adds, such as managed Spark.
DB2 Warehouse on Cloud: IBM Db2 Warehouse on Cloud is a fully-managed, enterprise-class, cloud data warehouse service. Powered by IBM BLU Acceleration.
IBM Cloud Object Storage: An IBM Cloud service that provides an unstructured cloud data store to build and deliver cost effective apps and services with high reliability and fast speed to market.
Internet of Things Platform: This service is the hub for IBM Watson IoT and lets you communicate with and consume data from connected devices and gateways. Use the built-in web console dashboards to monitor your IoT data and analyze it in real time.
Follow these steps to setup and run this developer pattern. The steps are described in detail below.
Sign up for IBM's Watson Studio. By signing up for the Watson Studio, two services: Apache Spark
and Object Storage
will be created in your IBM Cloud account.
Create the IBM Cloud services by following the links below.
Service Name
and choose Free
Pricing Plan. Click on Create
.
region
and create a Container unit using Add a container
link.
DB2 Warehouse on Cloud
service instance on IBM Cloud Dashboard. Click Open
to launch the Dashboard.Explore
from the panel, choose schema and then create a New Table
.
CHANGEPOINTIOT
with following schema: SENSORID VARCHAR(20)
TIMESTAMP VARCHAR(100)
SENSORVALUE DECIMAL(8,5)
SENSORUNITS VARCHAR(100)
}Internet of Things Platform
service instance on IBM Cloud Dashboard. Launch the Watson IoT Platform Dashboard.
Create the Node-RED Starter application by following the link. Choose an appropriate name for the Node-RED application - App name:
. Click on Create
.
On the newly created Node-RED application page, Click on Visit App URL
to launch the Node-RED editor once the application is in Running
state.
On the Welcome to your new Node-RED instance on IBM Cloud
screen, Click on Next
On the Secure your Node-RED editor
screen, enter a username and password to secure the Node-RED editor and click on Next
On the Browse available IBM Cloud nodes
screen, click on Next
On the Finish the install
screen, click on Finish
Click on Go to your Node-RED flow editor
* Install the following nodes before importing the flow. To do this select ?Manage Palette? from the menu (top right), and then select the install tab in the palette. Search for new nodes name to install and click on Install
.
node-red-contrib-objectstore
node-red-contrib-ibm-watson-iot
The flow json for Node-RED can be found under configuration
directory.
Download the configuration/node-red.json
Open the file with a text editor and copy the contents to Clipboard
On the Node-RED flow editor, click the Menu and select Import
-> Clipboard
and paste the contents
Object Storage node (getFileData_in_buffer): Provide your Object Storage service credentials. Service credentials are available in IBM Cloud service instance. Configure node in buffer mode to read the file from your object storage service. Ensure the sample data is loaded into Object storage as explained in Create IBM Cloud Services section above.
Watson-IoT node (TemperatureSensor): Configure this with a registered device on Watson IoT Platform. To configure Watson IoT node in node-red, refer to : https://developer.ibm.com/recipes/tutorials/simulating-a-device-and-publishing-034messages034-to-ibm-iot-platform-from-a-nodered-034watson-iot034-platform-node/
IBM IoT node: Configure IBM IoT node to receive events from Watson IoT Platform using the API keys generated in Create IBM Cloud Services section. To setup IBM IoT Node in node-red refer to step 5 in https://developer.ibm.com/recipes/tutorials/getting-started-with-watson-iot-platform-using-node-red/
dashDB node (CHANGEPOINTIOT): Use credentials of DB2 Warehouse on Cloud service. Service credentials are available in IBM Cloud service instance. Provide database table name CHANGEPOINTIOT
in which sensor data will get populated.
Deploy
buttonNode-red flow is designed as:
The csv file with sample sensor data is uploaded in object storage.
Prepare a csv string from the sample data file and give this string, as an input to csv node.
csv node will act as a device simulator and it will trigger an event of temperature sensor for each row of data.
The events sent by temperature sensor will be received by IBM IoT Platform.
This data will be prepared and then stored in the database.
Data from DB can be used in R Jupyter notebook for analytics.
In Node-RED Flow, click on the input of inject
node. It will trigger the execution of the node-red flow and on successful execution, data will get stored to DB2 table CHANGEPOINTIOT
.
Use the menu on the left to select My Projects
and then Default Project
.
Click on Add notebooks
(upper right) to create a notebook.
From URL
tab.Create Notebook
button.Once the files have been uploaded into Object Storage
you need to update the variables that refer to the .json configuration files in the R - Jupyter Notebook.
In the notebook, update the Watson Studio configuration .json file name in section 2.1.1
Edit the Watson Studio configuration .json file
Update the paramvalue
ONLY to suit your requirements and save the .json file
Retain the rest of the format and composition of the .json file
The descriptions of the parameters that can be configured are as below.
coltimestamp: Name of the column which holds the Time stamp of data recorded by Sensor
colsensorid: Name of the column which holds the Sensor identification
colsensorvalue: Name of the column that stores the values measured by sensor
sensorid: Sensor ID for which the analysis needs to be applied
datatimeformat: Time format of the data in the data frame
intimezone: Time zone for the Time stamps
rangetimeformat: Time format which is used for specifying the time ranges
Pfrom: Start Time for first series Time range
Pto: End Time for first series Time range
Cfrom: Start Time for second series Time range
Cto: End Time for second series Time range
thresholdpercent: Set the threshold percentage of change if detected
In section 2.1.2 of Watson Studio notebook, Insert (replace) your own Object storage file credentials to read the .json configuration file
Also replace the function name in the block that Read json configuration file in section 2.1.3
Use Find and Add Data
(look for the 10/01
icon)
and its Connectsions
tab. You must be able to see your database connection created earlier.
From there you can click Insert to Code
under the Data connection
list and add ibm DBR code
with connection credentials to the flow.
Note: If you don't have your own data and configuration files, you can reuse our example in the “Read IoT Sensor data from database” section. Look in the
data/sensordata2016_1s3dys.csv
directory for data file.
When a notebook is executed, what is actually happening is that each code cell in the notebook is executed, in order, from top to bottom.
Each code cell is selectable and is preceded by a tag in the left margin. The tag
format is In [x]:
. Depending on the state of the notebook, the x
can be:
*
, this indicates that the cell is currently executing.There are several ways to execute the code cells in your notebook:
Play
button in the toolbar.Cell
menu bar, there are several options available. For example, you
can Run All
cells in your notebook, or you can Run All Below
, that will
start executing from the first cell under the currently selected cell, and then
continue executing all cells that follow.Schedule
button located in the top right section of your notebook
panel. Here you can schedule your notebook to be executed once at some future
time, or repeatedly at your specified interval.The notebook outputs the results in the Notebook which can be copied to clipboard
The graphs give a visual indication of how the Sensor values behave during the 2 time periods
Statistics on these 2 time periods like averages, standard deviations, quartiles are computed and deviations computed for each of them. Then a overall deviation is computed and compared against the threshold set earlier in the Watson Studio configuration file
Based on the threshold deviation specified by the user, if the overall computed deviation exceeds the threshold configured, custom R functions will output if there is a Change point occurrence detected