awslabs/ecs-refarch-batch-processing

Name: ecs-refarch-batch-processing

Owner: Amazon Web Services - Labs

Owner: AWS Samples

Description: A reference architecture for handling batch processing workloads using Amazon ECS.

Created: 2017-02-10 18:43:15.0

Updated: 2017-12-13 22:20:40.0

Pushed: 2017-02-22 08:38:06.0

Homepage:

Size: 816

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Amazon ECS Reference Architecture: Batch Processing

This reference architecture shows how to handle Batch Processing using Amazon ECS. You may also want to consider AWS Batch, a service that dynamically provisions the optimal quantity and type of compute resources based on the volume and specific resource requirements of the batch jobs submitted. The Batch Processing reference architecture diagram below illustrates the architecture.

The AWS CloudFormation template included in this example creates an input and an output Amazon S3 bucket, an Amazon SQS queue, an Amazon CloudWatch alarm, an ECS cluster, and an ECS task definition. Objects uploaded to the input S3 bucket trigger an event that sends object details to the SQS queue. The ECS task deploys a Docker container that reads from that queue, parses the message containing the object name and then downloads the object. Once transformed it will upload the objects to the S3 output bucket. This example uses images, in jpg format, to showcase the batch processing architecture. Upload images with a .jpg suffix to the input S3 bucket to trigger the event. NOTE: Use the lowercase .jpg suffix.

By using the SQS queue as the location for all object details, we can take advantage of it's scalability and reliability as the queue will automatically scale based on the incoming messages and message retention can be configured. The ECS Cluster will then be able to scale services up or down based on the number of messages in the queue.

The CloudFormation template creates an IAM role that the ECS task assumes in order to get access to the S3 buckets and SQS queue. Note that the permissions of the IAM role doesn't specify the S3 bucket ARN for the incoming bucket. This is to avoid a circular dependency issue in the CloudFormation template. In a real-world scenario, you should always make sure to assign the least amount of privileges needed to an IAM role.

Running the example

Follow these steps to run the template.

Step 1: Clone the Github repository and build the Docker image

To run the entire example, first clone the source repository, using the following command:

$ git clone https://github.com/awslabs/ecs-refarch-batch-processing.git

Build and push the Docker image to a Docker registry (such as Docker Hub):

$ cd ecs-refarch-batch-processing/docker

Make sure to log in with your Docker Hub account credentials:

$ docker login

Build the Docker image:

$ docker build -t <repo>/<image> .

Push the image:

$ docker push

Step 2: Create a CloudFormation stack

Choose Launch Stack to launch the template in the us-east-1 region in your account:

Launch ECS batch processing with CloudFormation

The CloudFormation template requires the following parameters:

Step 3: Create the S3 event trigger for the SQS queue

Go to the S3 Console in your AWS Account and select the S3 Input Bucket that the CloudFormation template created and go to Properties -> Events.

Configure an event notification to the SQS queue called SQSBatchQueue for the ObjectCreated (All) event and in the Suffix field enter “jpg”.

You can learn more about configuring S3 event notifications here.

Step 4: Create the ECS Service

Go to the ECS Console in your AWS Account and create an ECS Service choosing the ECS Cluster and Task definition created by the CloudFormation template. Give the service a name and set the number of desired tasks to deploy as part of the service. For this example, you can configure the basic service parameters.

Step 5: Update the ECS Service to configure Auto Scaling

In this step you will configure auto scaling for the service you created in step 4. CloudWatch allows you to trigger alarms when a threshold is met for a metric. The CloudFormation template creates a CloudWatch Alarm for the SQS queue on the ApproximateNumberOfMessagesVisible metric so that when the number of messages exceeds a specified limit over a specified time period, the ECS Service will launch an additional task on the ECS Cluster. Use this existing alarm when configuring the scaling for the service.

Select the service created in Step 4 and click Update, then “Configure Service Auto Scaling”. Choose “Configure Service Auto Scaling to adjust your service?s desired count” and fill in the minimum, desired and maximum number of tasks. Click on “Add a scaling policy” and use the existing alarm (created by the CloudFormation template).

The CloudWatch alarm created by the template should now look similar to this.

Your service configuration should look similar to this.

Testing the example

Once you have completed the above steps, you can test the example as follows:

  1. Upload one or more .jpg files into your S3 input bucket (lowercase .jpg suffix).
  2. Explore the output files in the S3 output bucket.
Cleaning up the example resources

To remove all resources created by this example, do the following:

  1. Delete the created output and input S3 buckets.
  2. Delete the CloudFormation stack.
  3. Delete the ECS cluster.
  4. Delete the EC2 Role.
CloudFormation template resources

The following sections explain all of the resources created by the CloudFormation template provided with this example.

License

This reference architecture sample is licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.