IBM/predict-flight-delay-using-r4ml

Name: predict-flight-delay-using-r4ml

Owner: International Business Machines

Description: null

Created: 2018-05-14 15:28:01.0

Updated: 2018-05-14 22:06:48.0

Pushed: 2018-02-06 02:10:23.0

Homepage: null

Size: 571

Language: Jupyter Notebook

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Predicting flight delay and building an ML pipeline using R4ML

In this developer journey we will use R4ML, a scalable R package, running on IBM Data Science Experience (DSX) to perform various Machine Learning exercises. For those users who are unfamiliar with the Data Science Experience, DSX is an interactive, collaborative, cloud-based environment where data scientists, developers, and others interested in data science can use tools (e.g., RStudio, Jupyter Notebooks, Spark, etc.) to collaborate, share, and gather insight from their data.

When the reader has completed this journey, they will understand how to:

The Intended audience of this code pattern is data scientists, who wish to apply scalable machine learning algorithms using R. R4ML provides various out of the box algorithms to experiments with. This specific Code Pattern will provide a SVM (Suport Vector Machine) example to demonstrate the ease and power of R4ML in implementing the scalable classification. For more information about additional functionality support, documentation, and the roadmap, please vist R4ML

Source of data
Flow

  1. Load the provided notebook onto the IBM Data Science Experience platform.
  2. The notebook interacts with an Apache Spark instance.
  3. A sample big data dataset is loaded into the Jupyter Notebook.
  4. To perform machine learning, R4ML is used atop Apache Spark.

What problem does it solve for developers?

  1. Large Scale Model Training for classification using a Support Vector Machine
  2. Large Scale Model Tuning using Cross validation

Included Components:

Featured Technologies:
Analysis Section:
Scalable R4ML Key Features: Predict whether the flight will be delayed or not?

Steps:

Follow these steps to setup and run this developer journey. These steps are described in detail below.

  1. Sign up for the Data Science Experience
  2. Create the notebook
  3. Run the notebook
  4. Save and share
1. Sign up for the Data Science Experience

Sign up for IBM's Data Science Experience. By signing up for the Data Science Experience, two services will be created in your Bluemix account: DSX-Spark and DSX-ObjectStore. If these services do not exist, or if you are already using them for some other application, you will need to create new instances.

To create these services:

Note: When creating your Object Storage service, select the Swift storage type in order to avoid having to pay an upgrade fee.

Take note of your service names as you will need to select them in the following steps.

2. Create the notebook

First you must create a new Project:

Create Notebook 1:

3. Run the notebook

When a notebook is executed, what is actually happening is that each code cell in the notebook is executed, in order, from top to bottom.

Each code cell is selectable and is preceded by a tag in the left margin. The tag format is In [x]:. Depending on the state of the notebook, the x can be:

There are several ways to execute the code cells in your notebook:

4. Save and share
How to save your work:

Under the File menu, there are several ways to save your notebook:

How to share your work:

You can share your notebook by selecting the ?Share? button located in the top right section of your notebook panel. The end result of this action will be a URL link that will display a ?read-only? version of your notebook. You have several options to specify exactly what you want shared from your notebook:

License

Apache 2.0


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.