IBM/procurement-analysis-with-wks

Name: procurement-analysis-with-wks

Owner: International Business Machines

Description: ***WORK IN PROGRESS***

Created: 2018-01-31 05:35:42.0

Updated: 2018-05-24 13:30:27.0

Pushed: 2018-05-24 13:30:26.0

Homepage:

Size: 5607

Language: JavaScript

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Build Status

Creating a smarter procurement system with Watson Knowledge Studio and Watson Discovery

In this code pattern we will be creating a complete end to end solution for a procurement use case. Currently customers perform analysis of various market reports on their own or hire experts to make procurement decision. These experts analyze reports captured from data sources, a process that can be time consuming and prone to human error. This could potentially cause a chain effect of issues that may impact production.

By using our intelligent procurement system, based on Watson Discovery, a customer can receive expert analysis more quickly and accurately. The customer must first train the model with various use cases (via reports) to receive accurate results. The target end user of this system is a person working in a procurement role at a company.

As a developer going through this code pattern, you will learn how to:

As an end user, you will be able to:

Watson Discovery with and without Watson Knowledge Studio

To understand the significance of Watson Knowledge Studio (WKS) in this example we will look at the output extracted from Watson Discovery when using with WKS and without using WKS.

Waston Discovery output without WKS:

..
t": "Asahi Kasei Corp",
evance": 0.227493,
e": "Company"
...

t": "Kawasaki",
evance": 0.274707,
e": "Company"
...

Watson Discovery output with WKS:

...
: "-E114",
t": "Asahi Kasei Corp",
e": "Supplier""
...

: "-E119",
t": "Kawasaki",
e": "Facility"
...

Looking at the output of Discovery without WKS we can see that Asahi Kasei and Kawasaki are identified as a company, this is expected as Discovery without WKS only performs basic Natural Language Understanding (NLU) processing, it cannot understand language specific to the procurement domain. However, if we use Watson Discovery with WKS we can see that Asahi Kasei is identified as a supplier, whereas Kawasaki is identified as a facility.

Process Flow

The steps followed to create solution is as follows. For commands please refer Running the application on IBM Cloud section below.

Watson Knowledge Studio (WKS)
  1. We build Type System specific to business domain/use case
  2. We follow human annotation process to identify entities and relationship.
  3. We create machine learning model and train the model till we are satisfied with model.
  4. The corpus document from document tab can be exported which can be imported into new wks project if required.
Discovery Service
  1. We create discovery service from bluemix account. The discovery has to be created under US South as services under US South are ONLY visible while deploying wks model into discovery.
  2. We create collection with customized configuration which points to wks model id.
IBM Graph
  1. We create graph for this use case by creating schema/initial data for bootstrapping graph.
Client Application
  1. We create client application which calls Discovery Service
  2. The output (json data) of discovery service is parsed and nodes and edges for the graph are created dynamically.
Technical Architecture

Included Components

Steps

  1. Clone the repo
  2. Create IBM Cloud services
  3. Create a Watson Knowledge Studio workspace
  4. Upload Type System
  5. Import Corpus Documents
  6. Create an Annotation Set
  7. Create a Task for Human Annotation
  8. Create the model
  9. Deploy the machine learning model to Discovery
  10. Create and Configure a Watson Discovery Collection
  11. Configure credentials
  12. Run the application
  13. Deploy and run the application on IBM Cloud
1. Clone the repo
clone https://github.com/IBM/procurement-analysis-with-wks
2. Create IBM Cloud services

Create the following services:

3. Create a Watson Knowledge Studio workspace

Launch the WKS tool and create a new workspace.

4. Upload Type System

A type system allows us to define things that are specific to our SMS messages. The type system controls how content can be annotated by defining the types of entities that can be labeled and how relationships among different entities can be labeled.

To upload our pre-defined type system, from the Access & Tools -> Entity Types panel, press the Upload button to import the Type System file data/wks-resources/types-36a431a0-f6a0-11e7-8256-672fd3d48302.json found in the local repository.

This will upload a set of Entity Types and Relation Types.

5. Import Corpus Documents

Corpus documents are required to train our machine-learning annotator component. For this Code Pattern, the corpus documents will contain example procurement documents.

From the Access & Tools -> Documents panel, press the Upload Document Sets button to import a Document Set file. Use the corpus documents file data/wks-resources/corpus-36a431a0-f6a0-11e7-8256-672fd3d48302.zip found in the local repository.

NOTE: Uploading the corpus documents provided in this Code Pattern is not required, but recommended to simplify the annotation process (all provided documents will come pre-annotated). An alternative approach would be to is to upload standard text files and perform the annotations manually.

NOTE: Select the option to “upload corpus documents and include ground truth (upload the original workspace's type system first)“.

6. Create an Annotation Set

Once the corpus documents are loaded, we can start the human annotation process. This begins by dividing the corpus into multiple document sets and assigning the document sets to human annotators (for this Code Pattern, we will just be using using one document set and one annotator).

From the Access & Tools -> Documents panel, press the Create Annotation Sets button. Select a valid Annotator user, and provide a unique name for Set name.

7. Create a Task for Human Annotation

Add a task for human annotation by creating a task and assigning it annotation sets.

From the Access & Tools -> Documents panel, select the Task tab and press the Add Task button.

Enter a unique Task name and press the Create button.

A panel will then be displayed of the available annotation sets that can be assigned to this task. Select the Annotation Set you created in the previous step, and press the Create Task button.

7.1 Start the Human Annotation task

Click on the task card to view the task details panel.

Click the Annotate button to start the Human Annotation task.

If you select any of the documents in the list, the Document Annotation panel will be displayed. Since we previously imported the corpus documents, the entity and relationship annotations are already completed (as shown in the following examples). You can annotate mentions (occurrences of words/phrases which can be annotated as an entity) to play around, or you can modify one by annotating mentions with a different entity.

7.2 Submit Annotation Set

From the Task details panel, press the Submit All Documents button.

All documents should change status to Completed.

Press the blue “File” icon to toggle back to the Task panel, which will show the completion percentage for each task.

From the Access & Tools -> Documents panel, select the Task tab and select the task to view the details panel.

Select your Annotation Set Name and then press the Accept button. This step is required to ensure that the annotation set is considered ground truth.

NOTE: The objective of the annotation project is to obtain ground truth, the collection of vetted data that is used to adapt WKS to a particular domain.

Status should now be set to COMPLETED.

8. Create the model

Go to the Model Management -> Performance panel, and press the Train and evaluate button.

From the Document Set name list, select the Annotation Set Name you created previously and press the Train & Evaluate button.

This process may take several minutes to complete. Progress will be shown in the upper right corner of the panel.

Note: In practice, you would create separate annotation sets (each containing thousands of messages) for training and evaluation.

Once complete, you will see the results of the train and evaluate process.

9. Deploy the machine learning model to Discovery

Now we can deploy our new model to the already created Discovery service. Navigate to the Version menu on the left and press Take Snapshot.

The snapshot version will now be available for deployment to Discovery.

To start the process, click the Deploy button associated with your snapshot version.

Select the option to deploy to Discovery.

Then enter your IBM Cloud account information to locate your Discovery service to deploy to.

Once deployed, a Model ID will be created. Keep note of this value as it will be required later in this Code Pattern.

NOTE: You can also view this Model ID by pressing the WDS button listed with your snapshot version.

10. Create and Configure a Watson Discovery Collection

Launch the Watson Discovery tool. Create a new data collection and give the data collection a unique name.

From the new collection data panel, under Configuration click the Switch button to switch to a new configuration file. Click Create a new configuration option.

Enter a unique name and press Create.

From the Configuration Panel, press the Add enrichments option. Ensure that the following extraction options are added: Keyword, Entity, and Relation.

Also, assign your Model ID to both the Entity Extraction and Relation Extraction.

Note: These Model ID assignments are required to ensure your review data is properly enriched.

Close the Add Ennrichments panel by pressing Done.

Save the configuration by pressing Apply & Save, and then Close.

Once the configuration is created, you can proceed with loading discovery files.

From the new collection data panel, under Add data to this collection use Drag and drop your documents here or browse from computer to seed the content with the procurment document files extracted from data/disco-docs/.

11. Configure credentials
nv.sample .env

Edit the .env file with the necessary settings.

env.sample:
place the credentials here with your own.
name this file to .env before starting the app.

nusGraph DB 
H_DB_USERNAME=admin
H_DB_PASSWORD=<add_janusgraph_password>
H_DB_API_URL=<add_janusgraph_api_url>

tson Discovery
OVERY_USERNAME=<add_discovery_username>
OVERY_PASSWORD=<add_discovery_password>
OVERY_ENVIRONMENT_ID=<add_discovery_environment_id>
OVERY_CONFIGURATION_ID=<add_discovery_configuration_id>
OVERY_COLLECTION_ID=<add_discovery_collection_id>

The settings can be found by navigating to the specific service instance from within the IBM Cloud dashboard.

For the JanusGraph entries, navigate to the Service Credentials panel for the your JanusGraph service instance. The values can be found in the gremlin_console_yaml section of the generated credentials. For example:

mlin_console_yaml": [
osts: [portal-ssl204-25.bmix-dal-yp-299e7bd4.test1-ibm-com.composedb.com]\nport: 41590\nusername: admin\npassword: MASHDUVREXMCSZLR\nconnectionPool: { enableSsl: true }\nserializer: { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { serializeResultToString: true }}",

In this case, you would set your values to:

H_DB_API_URL=https://portal-ssl204-25.bmix-dal-yp-299e7bd4.test1-ibm-com.composedb.com:41590
H_DB_PASSWORD=MASHDUVREXMCSZLR
12. Run the application
  1. Install Node.js runtime or NPM.
  2. Start the app by running npm install, followed by npm start.
  3. Access the UI by pointing your browser at the host and port values returned by the npm start command. For example, http://localhost:6003.
13. Deploy and run the application on IBM Cloud

To deploy to the IBM Cloud, make sure you have the IBM Cloud CLI tool installed. Then run the following commands to login using your IBM Cloud credentials.

rocurement-analysis-with-wks
ogin

When pushing your app to the IBM Cloud, values are read in from the manifest.yml file. Edit this file if you need to change any of the default settings, such as application name or the amount of memory to allocate.


ications:
me: procurement-analysis-with-wks
mory: 256M
stances: 1
th: .
ildpack: sdk-for-nodejs
ndom-route: false

Additionally, your environment variables must be set in your .env file as described previously in Step 11. Configure credentials.

To deploy your application, run the following command.

ush

NOTE: The URL route assigned to your application will be displayed as a result of this command. Note this value, as it will be required to access your app.

To view the application, go to the IBM Cloud route assigned to your app. Typically, this will take the form https://<app name>.mybluemix.net.

To view logs, or get overview information about your app, use the IBM Cloud dashboard.

Sample UI layout

Troubleshooting

Links

Learn more

License

Apache 2.0


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.