awslabs/amazon-elasticsearch-lambda-samples

Name: amazon-elasticsearch-lambda-samples

Owner: Amazon Web Services - Labs

Owner: AWS Samples

Description: Data ingestion for Amazon Elasticsearch Service from S3 and Amazon Kinesis, using AWS Lambda: Sample code

Created: 2015-09-04 17:05:06.0

Updated: 2018-01-10 02:04:27.0

Pushed: 2016-11-30 20:52:30.0

Homepage: null

Size: 11

Language: JavaScript

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Streaming Data to Amazon Elasticsearch Service

Using AWS Lambda: Sample Node.js Code
Package amazon-elasticsearch-lambda-samples

Copyright 2015- Amazon.com, Inc. or its affiliates. All Rights Reserved.

Introduction

It is often useful to stream data, as it gets generated, for indexing in an Amazon Elasticsearch Service domain. This helps fresh data to be available for search or analytics. To do this requires:

  1. Knowing when new data is available
  2. Code to pick up and parse the data into JSON documents, and add them to an Amazon Elasticsearch (henceforth, ES for short) domain.
  3. Scalable and fully managed infrastructure to host this code

Lambda is an AWS service that takes care of these requirements. Put simply, it is an “event handling” service in the cloud. Lambda lets us implement the event handler (in Node.js or Java), which it hosts and invokes in response to an event.

The handler can be triggered by a “push” or a “pull” approach. Certain event sources (such as S3) push an event notification to Lambda. Others (such as Kinesis) require Lambda to poll for events and pull them when available.

For more details on AWS Lambda, please see the documentation.

This package contains sample Lambda code (in Node.js) to stream data to ES from two common AWS data sources: S3 and Kinesis. The S3 sample takes apache log files, parses them into JSON documents and adds them to ES. The Kinesis sample reads JSON data from the stream and adds them to ES.

Note that the sample code has been kept simple for reasons for clarity. It does not handle ES document batching, or eventual consistency issues for S3 updates, etc.

Setup Overview

While some detailed instructions are covered later in this file and elsewhere (in the Lambda documentation), this section aims to show the larger picture that the individual steps work to accomplish. We assume that the data source (an S3 bucket or a Kinesis stream, in this case) and an ES domain are already set up.

  1. Deployment Package: The “Deployment Package” is the event handler code files and its dependencies packaged as a zip file. The first step in creating a new Lambda function is to prepare and upload this zip file.

  2. Lambda Configuration:

  3. Handler: The name of the main code file in the deployment package, with the file extension replaced with a .handler suffix.

  4. Memory: The memory limit, based on which the EC2 instance type to use is determined. For now, the default should do.

  5. Timeout: The default timeout value (3 seconds) is quite low for our use-case. 10 seconds might work better, but please adjust based on your testing.

  6. Authorization: Since there is a need here for various AWS services making calls to each other, appropriate authorization is required. This takes the form of configuring an IAM role, to which various authorization policies are attached. This role will be assumed by the Lambda function when running.

Note:

Deployment Package Creation
  1. On your development machine, download and install Node.js.

  2. Anywhere, create a directory structure similar to the following:

    eslambda (place sample code here) | +– node_modules (dependencies will go here)

  3. Modify the sample code with the correct ES endpoint, region, index and document type.

  4. Install each dependency imported by the sample code (with the require() call), as follows:

    npm install

    Verify that these are installed within the node_modules subdirectory.

  5. Create a zip file to package the code and the node_modules subdirectory

    zip -r eslambda.zip *

The zip file thus created is the Lambda Deployment Package.

S3-Lambda-ES

Set up the Lambda function and the S3 bucket as described in the Lambda-S3 Walkthrough. Please keep in mind the following notes and configuration overrides:

Kinesis-Lambda-ES

Set up the Lambda function and the Kinesis stream as described in the Lambda-Kinesis Walkthrough. Please keep in mind the following notes and configuration overrides:


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.