IBM/detect-timeseriesdata-change

Name: detect-timeseriesdata-change

Owner: International Business Machines

Description: This repository contains instructions for data retrieval and statistical analysis using R - Jupyter notebook to analyze and detect change-points in IoT sensor data. It also covers the data acquisition and storage of sensor data in database using node-red.

Created: 2017-07-31 16:37:17.0

Updated: 2018-04-27 18:36:26.0

Pushed: 2018-04-27 18:39:24.0

Homepage: https://developer.ibm.com/code/patterns/detect-change-points-in-iot-sensor-data

Size: 4252

Language: Jupyter Notebook

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Change Point Detection in Time Series Sensor data

Data Science Experience is now Watson Studio. Although some images in this code pattern may show the service as Data Science Experience, the steps and processes will still work.

Overview and Goal

This code pattern is intended for any developer who wants to experiment, learn, enhance and implement a new method for Statistically detecting Change point in Sensor data. Sensors mounted on devices like IoT devices, Automated manufacturing like Robot arms, Process monitoring and Control equipment etc., collect and transmit data on a continuous basis which is Time stamped.

This code pattern takes you through end to end flow of steps in collating statistics on such Time series data and identify if a Change point has occurred. Core building blocks would include computing Statistical parameters from the Time series data, which compares a Previous dataset of a certain Time range in the past with the Current Series in a recent Time range. Statistical comparison between these two results in detection of any change points. R statistical software is used in this pattern with sample Sensor data loaded into the Data Science experience cloud.

All the intermediary steps are modularized and all code open sourced to enable developers to use / modify the modules / sub-modules as they see fit for their specific application.

This code pattern utilizes IoT sensor data and its primary goal is to statistically identify the change point in this sensor data rather than the acquisition and storage of the data itself. For sake of completeness of the flow, a simulation of the IoT data acquisition is included as a first step.

png

A detailed pattern of acquisition and storage of IoT sensor data is already covered extensively elsewhere. References to the details of these patterns are also given.

When you have completed this code pattern, you will understand how to

This code pattern can be logically split into 2 major parts:

Prerequisites

You will need the following accounts and tools:

Steps:
  1. Log into IBM Cloud and create IBM Cloud services
  2. Create Node-RED Application to load IoT data into DB2 table by using the provided .json configuration file
  3. Read IoT data from the sample csv file provided. The Node-RED flow can be changed to read from IoT devices directly
  4. Import the sample data into a DB2 table using the Node-RED flow
  5. User configures the parameters in .json dsx configuration file that will be used in Data Science experience and updates credentials to read the configuration file
  6. In R notebook flow, user then updates credentials to read relevant Sensor data subset from the DB2 table. Data from the cloud database will be read by R Spark dataframe in Watson Studio notebook. The user will further extract the 2 series of datasets to be compared. R notebook will use open R libraries and Custom built function components to get the statistics computed. User will generate visual comparison charts to get visual insights on changes in behavior of the sensor values. These Statistical metrics will be compared and the changes analyzed using the Custom functions written in R
  7. In Data science experience R runs on Spark engine to ensure scalability and performance
  8. Object storage is used to store the configuration file where Watson Studio reads the parameters from. The results can also be stored in Object storage if needed

Developer can reuse all components that support the above steps like

Included components
Featured technologies

Watch the Video

Steps

Follow these steps to setup and run this developer pattern. The steps are described in detail below.

  1. Sign up for the Watson Studio
  2. Create IBM Cloud services
  3. Create Node-RED App and inject IoT data
  4. Create the notebook
  5. Add the data and configuraton file
  6. Run the notebook
  7. Download the results
1. Sign up for the Watson Studio

Sign up for IBM's Watson Studio. By signing up for the Watson Studio, two services: Apache Spark and Object Storage will be created in your IBM Cloud account.

2. Create IBM Cloud services

Create the IBM Cloud services by following the links below.

3. Create Node-RED App and inject IoT data

Create the Node-RED Starter application by following the link. Choose an appropriate name for the Node-RED application - App name:. Click on Create.

Node-RED Starter

png

Import Node-RED flow by importing the configuration .json

The flow json for Node-RED can be found under configuration directory.

Adjustments to the node properties in Node-RED Flow
  1. Object Storage node (getFileData_in_buffer): Provide your Object Storage service credentials. Service credentials are available in IBM Cloud service instance. Configure node in buffer mode to read the file from your object storage service. Ensure the sample data is loaded into Object storage as explained in Create IBM Cloud Services section above.

  2. Watson-IoT node (TemperatureSensor): Configure this with a registered device on Watson IoT Platform. To configure Watson IoT node in node-red, refer to : https://developer.ibm.com/recipes/tutorials/simulating-a-device-and-publishing-034messages034-to-ibm-iot-platform-from-a-nodered-034watson-iot034-platform-node/

  3. IBM IoT node: Configure IBM IoT node to receive events from Watson IoT Platform using the API keys generated in Create IBM Cloud Services section. To setup IBM IoT Node in node-red refer to step 5 in https://developer.ibm.com/recipes/tutorials/getting-started-with-watson-iot-platform-using-node-red/

  4. dashDB node (CHANGEPOINTIOT): Use credentials of DB2 Warehouse on Cloud service. Service credentials are available in IBM Cloud service instance. Provide database table name CHANGEPOINTIOT in which sensor data will get populated.

    Deploy the Node-RED flow by clicking on the Deploy button

Node-red flow is designed as:

  1. The csv file with sample sensor data is uploaded in object storage.

  2. Prepare a csv string from the sample data file and give this string, as an input to csv node.

  3. csv node will act as a device simulator and it will trigger an event of temperature sensor for each row of data.

  4. The events sent by temperature sensor will be received by IBM IoT Platform.

  5. This data will be prepared and then stored in the database.

  6. Data from DB can be used in R Jupyter notebook for analytics.

    Inject the data in Node-RED Flow

    In Node-RED Flow, click on the input of inject node. It will trigger the execution of the node-red flow and on successful execution, data will get stored to DB2 table CHANGEPOINTIOT.

4. Create the R Spark Jupyter notebook

Use the menu on the left to select My Projects and then Default Project. Click on Add notebooks (upper right) to create a notebook.

5. Add the configuration and data access details
Fix-up configuration parameter .json file name and values

Once the files have been uploaded into Object Storage you need to update the variables that refer to the .json configuration files in the R - Jupyter Notebook.

In the notebook, update the Watson Studio configuration .json file name in section 2.1.1 png

Edit the Watson Studio configuration .json file
Update the paramvalue ONLY to suit your requirements and save the .json file
Retain the rest of the format and composition of the .json file

png

The descriptions of the parameters that can be configured are as below.

  1. coltimestamp: Name of the column which holds the Time stamp of data recorded by Sensor

  2. colsensorid: Name of the column which holds the Sensor identification

  3. colsensorvalue: Name of the column that stores the values measured by sensor

  4. sensorid: Sensor ID for which the analysis needs to be applied

  5. datatimeformat: Time format of the data in the data frame

  6. intimezone: Time zone for the Time stamps

  7. rangetimeformat: Time format which is used for specifying the time ranges

  8. Pfrom: Start Time for first series Time range

  9. Pto: End Time for first series Time range

  10. Cfrom: Start Time for second series Time range

  11. Cto: End Time for second series Time range

  12. thresholdpercent: Set the threshold percentage of change if detected

  13. In section 2.1.2 of Watson Studio notebook, Insert (replace) your own Object storage file credentials to read the .json configuration file

  14. Also replace the function name in the block that Read json configuration file in section 2.1.3

png png

Add the data and configuration to the notebook

Use Find and Add Data (look for the 10/01 icon) and its Connectsions tab. You must be able to see your database connection created earlier. From there you can click Insert to Code under the Data connection list and add ibm DBR code with connection credentials to the flow.

png

Note: If you don't have your own data and configuration files, you can reuse our example in the “Read IoT Sensor data from database” section. Look in the data/sensordata2016_1s3dys.csv directory for data file.

6. Run the notebook

When a notebook is executed, what is actually happening is that each code cell in the notebook is executed, in order, from top to bottom.

Each code cell is selectable and is preceded by a tag in the left margin. The tag format is In [x]:. Depending on the state of the notebook, the x can be:

There are several ways to execute the code cells in your notebook:

7. View the results

The notebook outputs the results in the Notebook which can be copied to clipboard

The graphs give a visual indication of how the Sensor values behave during the 2 time periods

Statistics on these 2 time periods like averages, standard deviations, quartiles are computed and deviations computed for each of them. Then a overall deviation is computed and compared against the threshold set earlier in the Watson Studio configuration file

Based on the threshold deviation specified by the user, if the overall computed deviation exceeds the threshold configured, custom R functions will output if there is a Change point occurrence detected

Troubleshooting

See DEBUGGING.md

Useful links

License

Apache 2.0


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.