servo/servo-warc-tests

Name: servo-warc-tests

Owner: Servo

Description: Test Servo on Web Archive snapshots of real web sites

Created: 2018-01-16 20:34:21.0

Updated: 2018-04-27 17:24:09.0

Pushed: 2018-04-05 16:57:35.0

Homepage: null

Size: 38067

Language: Shell

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Test Servo on Web Archive snapshots of real web sites

This directory contains web archives, together with scripts which use them for performance testing of Servo.

Google Data Studio report

Web archives

WARC web archives are a de facto standard for archiving web content. They are the storage format for the Internet Archive Wayback Machine and supported by the Library of Congress.

Web archives can be created and viewed in Servo, using the pywb tools, which can be installed using:

ualenv -p python3 venv
ce venv/bin/activate
install git+https://github.com/ikreymer/pywb.git
proxychains

Using the pywb tools in http proxy mode with Servo requires the proxychains command.

Debian-based systems:
get install proxychains

run with proxychains

MacOS:
 install proxychains-ng

run with proxychains4

Playing an existing archive

In this example we'll play the WBEZ archive.

In one window, run the wayback server on the WBEZ archive:

ack --proxy WBEZ --port 8321

The port number (8321 here) should match the one in proxychains.conf.

Then, run servo with this http proxy, so when you navigtate to a recorded web site it should take you to the recorded version:

ychains ${SERVO_DIRECTORY}/mach run -r --certificate-path proxy-certs/pywb-ca.pem https://www.wbez.org/
Adding a new archive

In this example we'll add a web achive for an example web site example.com.

First create a collection for the Example files:

anager init Example

Now start recording the web archive:

ack --proxy Example --live --proxy-record --autoindex --port 8321

In another window, run Servo with this http proxy, and navigate to the web site:

ychains ${SERVO_DIRECTORY}/mach run -r --certificate-path proxy-certs/pywb-ca.pem https://www.example.com/

Once the site has finished loading, exit Servo and the wayback server.

To test your archive, follow the instructions for playing an archive. In one window:

ack --proxy Example --port 8321

and in another:

ychains ${SERVO_DIRECTORY}/mach run -r --certificate-path proxy-certs/pywb-ca.pem https://www.example.com/

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.