Name: russia-explainer-serverless-chrome
Owner: NPR visuals team
Description: Run headless Chrome/Chromium on AWS Lambda (maybe Azure, & GCP later)
Forked from: eads/serverless-chrome
Created: 2017-05-10 17:41:52.0
Updated: 2017-09-07 17:00:44.0
Pushed: 2017-05-23 15:20:10.0
Size: 175686
Language: JavaScript
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
Serverless Chrome contains everything you need to get started running headless Chrome on AWS Lambda (possibly Azure and GCP Functions soon).
The aim of this project is to provide the scaffolding for using Headless Chrome during a serverless function invocation. Serverless Chrome takes care of building and bundling the Chrome binaries and making sure Chrome is running when your serverless function executes. In addition, this project also provides a few “example” handlers for common patterns (e.g. taking a screenshot of a page, printing to PDF, some scraping, etc.)
Why? Because it's neat. It also opens up interesting possibilities for using the Chrome Debugger Protocol in serverless architectures.
Installation can be achieved with the following commands
clone https://github.com/adieuadieu/serverless-chrome
erverless-chrome
install
(It is possible to exchange yarn
for npm
if yarn
is too hipster for you. No problem.)
Or, if you have serverless
installed globally:
erless install -u https://github.com/adieuadieu/serverless-chrome
You must configure your AWS credentials either by defining AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
environmental variables, or using an AWS profile. You can read more about this on the Serverless Credentials Guide.
In short, either:
rt AWS_PROFILE=<your-profile-name>
or
rt AWS_ACCESS_KEY_ID=<your-key-here>
rt AWS_SECRET_ACCESS_KEY=<your-secret-key-here>
Test with yarn test
or just yarn ava
to skip the linter.
deploy
This package bundles a lambda-execution-environment-ready headless Chrome binary which allows you to deploy from any OS. The current build is:
You can override default configuration in the /config.js
file generated at the root of the project after a yarn install
. See the defaults in src/config.js
for a full list of configuration options.
Currently there are only two, very basic “proof of concept” type functions:
When you the serverless function, it creates a Lambda function which will take a screenshot of a URL it's provided. You can provide this URL to the Lambda function via the AWS API Gateway. After a successful deploy, an API endpoint will be provided. Use this URL to call the Lambda function with a url in the query string. E.g. https://XXXXXXX.execute-api.us-west-2.amazonaws.com/dev/chrome?url=https://google.com/
We're using API Gateway as our method to execute the function, but of course it's possible to use any other available triggers to kick things off be it an event from S3, SNS, DynamoDB, etc. TODO: explain how –^
/config.js
rt captureScreenshot from './src/handlers/captureScreenshot'
rt default {
ndler: captureScreenshot
The printToPdf handler will create a PDF from a URL it's provided. You can provide this URL to the Lambda function via the AWS API Gateway. After a successful deploy, an API endpoint will be provided. Use this URL to call the Lambda function with a url in the query string. E.g. https://XXXXXXX.execute-api.us-west-2.amazonaws.com/dev/chrome?url=https://google.com/
Note: Headless Chrome currently doesn't expose any configuration options (paper size, orientation, margins, etc) for printing to PDF. You can follow Chromium's progress on this here and here. You can get some sense of the upcoming configuration options from the modifications to the Chrome Debugging Protocol here.
We're using API Gateway as our method to execute the function, but of course it's possible to use any other available triggers to kick things off be it an event from S3, SNS, DynamoDB, etc. TODO: explain how –^
/config.js
rt printToPdf from './src/handlers/printToPdf'
rt default {
ndler: printToPdf
You can provide your own handler via the /config.js
file created when you initialize the project with yarn install
. The config accepts a handler
property. Pass it a function which returns a Promise when complete. For example:
/config.js
rt default {
ndler: async function(invocationEventData, executionContext) {
const { queryStringParameters: { url } } = invocationEventData
const stuff = await doSomethingWith(url)
return stuff
The first parameter, invocationEventData
, is the event data with which the Lambda function is invoked. It's the first parameter provided by Lambda. The second, executionContext
is the second parameter provided to the Lambda function which contains useful runtime information.
serverless-chrome
calls the Lambda handlers callback()
for you when your handler function completes. The result of your handler is passed to callback with callback(null, yourHandlerResult)
. If your handler throws an error, callback is called with callback(yourHandlerError)
.
For example, to create a handler which returns the version info of the Chrome Debugger Protocol, you could modify /config.js
to:
rt Cdp from 'chrome-remote-interface'
rt default {
ync handler (event) {
const versionInfo = await Cdp.Version()
return {
statusCode: 200,
body: JSON.stringify({
versionInfo,
}),
headers: {
'Content-Type': 'application/json',
},
}
To capture all of the Network Request events made when loading a URL, you could modify /config.js
to something like:
rt Cdp from 'chrome-remote-interface'
rt { sleep } from './src/utils'
t LOAD_TIMEOUT = 1000 * 30
rt default {
ync handler (event) {
const requestsMade = []
let loaded = false
const loading = async (startTime = Date.now()) => {
if (!loaded && Date.now() - startTime < LOAD_TIMEOUT) {
await sleep(100)
await loading(startTime)
}
}
const [tab] = await Cdp.List()
const client = await Cdp({ host: '127.0.0.1', target: tab })
const { Network, Page } = client
Network.requestWillBeSent(params => requestsMade.push(params))
Page.loadEventFired(() => {
loaded = true
})
// https://chromedevtools.github.io/debugger-protocol-viewer/tot/Network/#method-enable
await Network.enable()
// https://chromedevtools.github.io/debugger-protocol-viewer/tot/Page/#method-enable
await Page.enable()
// https://chromedevtools.github.io/debugger-protocol-viewer/tot/Page/#method-navigate
await Page.navigate({ url: 'https://www.chromium.org/' })
// wait until page is done loading, or timeout
await loading()
// It's important that we close the websocket connection,
// or our Lambda function will not exit properly
await client.close()
return {
statusCode: 200,
body: JSON.stringify({
requestsMade,
}),
headers: {
'Content-Type': 'application/json',
},
}
See src/handlers
for more examples.
TODO: talk about CDP and chrome-remote-interface
/dev/shm
./tmp
size on Lambda1.0
Future
I keep getting a timeout error when deploying and it's really annoying.
Indeed, that is annoying. I've had the same problem, and so that's why it's now here in this troubleshooting section. This may be an issue in the underlying AWS SDK when using a slower Internet connection. Try changing the AWS_CLIENT_TIMEOUT
environment variable to a higher value. For example, in your command prompt enter the following and try deploying again:
rt AWS_CLIENT_TIMEOUT=3000000
Aaaaaarggghhhhhh!!!
Uuurrrggghhhhhh! Have you tried filing an Issue?
You might also be interested in: