IBM/SystemML_Usage

Name: SystemML_Usage

Owner: International Business Machines

Description: Demonstrate how to perform a Machine Learning exercise using Apache SystemML

Created: 2017-08-22 22:38:31.0

Updated: 2018-03-22 16:14:31.0

Pushed: 2018-03-22 16:14:33.0

Homepage: https://developer.ibm.com/code/patterns/perform-a-machine-learning-exercise/

Size: 1570

Language: Jupyter Notebook

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Using Apache SystemML for Machine Learning in a Watson Studio Notebook

Data Science Experience is now Watson Studio. Although some images in this code pattern may show the service as Data Science Experience, the steps and processes will still work.

In this Code Pattern we will use Apache SystemML running on IBM Watson Studio to perform a Machine Learning exercise. Watson Studio is an interactive, collaborative, cloud-based environment where data scientists, developers, and others interested in data science can use tools (e.g., RStudio, Jupyter Notebooks, Spark, etc.) to collaborate, share, and gather insight from their data. Apache SystemML is a flexible machine learning platform that is optimized to scale with large data sets.

When you have completed this Code Pattern, you will understand how to:

The intended audience for this Code Pattern is both application developers and other stakeholders who wish to utilize the power of Data Science quickly and effectively to solve machine learning problems using Apache SystemML. Although Apache SystemML provides various out-of-the box algorithms to experiment with, this specific Code Pattern will provide a Linear Regression example to demonstrate the ease and power of Apache SystemML. Additionally, users can develop their own algorithms using Apache SystemML's Declarative Machine Language (DML) which has R or Python like syntax, or customize any algorithm provided in the package. For more information about additional functionality support, documentation, and the roadmap, please visit Apache SystemML.

Flow
  1. Load the provided notebook onto the IBM Watson Studio platform.
  2. The notebook interacts with an Apache Spark instance.
  3. A sample big data dataset is loaded into the Jupyter Notebook.
  4. To perform machine learning, Apache SystemML is used atop Apache Spark.
Included Components
Featured technologies
State of the art

Typically data scientist writes an algorithm on subset of dataset which can be fit on the workstation (laptop) disk/memory. Once he/she is satisfied with the results on a workstation, he/she approach system engineer to implement same algorithm in the distributed environment with much bigger dataset. It may takes weeks if not months to go back and forth between data scientist and system engineer to have equivalent algorithm gets implemented in distributed environment on bigger dataset. As human intervention gets involved there is a potential for introduction of bugs in an implementation of equivalent algorithm. When final algorithm is ready it cannot be determined if final algorithm is equivalent to that of an algorithm which was implemented to run it on a workstation. Its hard to determine if any issues found are due to implementation of algorithm in distributed environment or due to an original algorithm itself.

There comes the ?State of the Art? from SystemML. With SystemML data scientist has to write an algorithm only once. With in-built optimizer from SystemML, any algorithm written will have dynamic runtime plan based on data characteristics and runtime environment such as single machine or cluster with multiple nodes. Data Scientist can save lot of time and possible error injection while transforming algorithm implemented to run on single machine to algorithm to be run in a distributed environment.

Watch the Video

Steps

Follow these steps to setup and run this Code Pattern. These steps are described in detail below.

  1. Sign up for the Watson Studio
  2. Create the notebook
  3. Run the notebook
  4. Save and Share
1. Sign up for Watson Studio

Sign up for IBM's Watson Studio. By creating a project in Watson Studio a free tier Object Storage service will be created in your IBM Cloud account. Take note of your service names as you will need to select them in the following steps.

Note: When creating your Object Storage service, select the Free storage type in order to avoid having to pay an upgrade fee.

To create these services:

Note: When creating your Object Storage service, select the Swift storage type in order to avoid having to pay an upgrade fee.

Take note of your service names as you will need to select them in the following steps.

2. Create the notebook

Create the Notebook:

3. Run the notebook

When a notebook is executed, what is actually happening is that each code cell in the notebook is executed, in order, from top to bottom.

Each code cell is selectable and is preceded by a tag in the left margin. The tag format is In [x]:. Depending on the state of the notebook, the x can be:

There are several ways to execute the code cells in your notebook:

4. Save and share
How to save your work:

Under the File menu, there are several ways to save your notebook:

How to share your work:

You can share your notebook by selecting the Share button located in the top right section of your notebook panel. The end result of this action will be a URL link that will display a ?read-only? version of your notebook. You have several options to specify exactly what you want shared from your notebook:

Links

Learn more

License

Apache 2.0


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.