Name: servo-warc-tests
Owner: Servo
Description: Test Servo on Web Archive snapshots of real web sites
Created: 2018-01-16 20:34:21.0
Updated: 2018-04-27 17:24:09.0
Pushed: 2018-04-05 16:57:35.0
Homepage: null
Size: 38067
Language: Shell
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
This directory contains web archives, together with scripts which use them for performance testing of Servo.
WARC web archives are a de facto standard for archiving web content. They are the storage format for the Internet Archive Wayback Machine and supported by the Library of Congress.
Web archives can be created and viewed in Servo, using the pywb tools, which can be installed using:
ualenv -p python3 venv
ce venv/bin/activate
install git+https://github.com/ikreymer/pywb.git
Using the pywb tools in http proxy mode with Servo requires the proxychains
command.
get install proxychains
run with proxychains
install proxychains-ng
run with proxychains4
In this example we'll play the WBEZ archive.
In one window, run the wayback
server on the WBEZ archive:
ack --proxy WBEZ --port 8321
The port number (8321 here) should match the one in proxychains.conf.
Then, run servo with this http proxy, so when you navigtate to a recorded web site it should take you to the recorded version:
ychains ${SERVO_DIRECTORY}/mach run -r --certificate-path proxy-certs/pywb-ca.pem https://www.wbez.org/
In this example we'll add a web achive for an example web site example.com.
First create a collection for the Example files:
anager init Example
Now start recording the web archive:
ack --proxy Example --live --proxy-record --autoindex --port 8321
In another window, run Servo with this http proxy, and navigate to the web site:
ychains ${SERVO_DIRECTORY}/mach run -r --certificate-path proxy-certs/pywb-ca.pem https://www.example.com/
Once the site has finished loading, exit Servo and the wayback
server.
To test your archive, follow the instructions for playing an archive. In one window:
ack --proxy Example --port 8321
and in another:
ychains ${SERVO_DIRECTORY}/mach run -r --certificate-path proxy-certs/pywb-ca.pem https://www.example.com/