sul-dlss/was_robot_suite

Name: was_robot_suite

Owner: Stanford University Digital Library

Description: Robots for Web Archiving Service accessioning and dissemination

Created: 2016-06-24 01:56:30.0

Updated: 2018-03-19 18:57:09.0

Pushed: 2018-05-22 23:21:19.0

Homepage: null

Size: 125832

Language: Ruby

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Build Status Coverage Status Code Climate Test Coverage Dependency Status GitHub tagged version

WAS_Robot_Suite

Robot code for accessioning and preservation of Web Archiving Service Seed and Crawl objects.

General Robot Documentation

Check the Wiki in the robot-master repo.

To run, use the lyber-core infrastructure, which uses bundle exec controller boot to start all robots defined in config/environments/robots_ENV.yml.

Deployment

The WAS robots depend on some java projects:

These java projects use jenkinsqa to create deployment artifacts, which are then deployed with capistrano via config/deploy.rb (see lines 40-54).

The deployed was_robot_suite houses these java artifacts in the jar directory.

Various other dependencies can be teased out of config/environments/example.rb and shared_configs (was-robotsxxx branches)

Documentation

See consul pages in Web Archival portal, esp Web Archiving Development Documentation

wasCrawlPreassembly

Preassembly workflow for web archiving crawl objects (that include (W)ARCs files) to extract and create metadata stream. It consists of 6 robots:

wasCrawlDissemination

Dissemination workflow for web archiving crawl objects. It is kicked off automagically by the last step in the common-accession/end-accession, as that reads the disseminationWF that is suitable for this object type based on APO. It consists of 3 robots:

wasSeedPreassembly

Preassembly workflow for web archiving seed objects. It starts with the output of the registration process (via was-registrar service) which is a source xml file that contains the metadata for the seed object. The metadata source xml file is expected to be in the appropriate xml format, which will then be converted using XSLT.

It consists of 5 robots:

wasSeedDissemination

This workflow provides the connection between the SDR and the actual web archiving components. It consists of 1 robot:

wasDissemination

Worfklow to route web archiving objects to the wasSeedDisseminationWF or wasCrawlDisseminationWF based on content type. Note that the wasDisseminationWF itself is fired off by the accessionWF by using the custom tag in the APO. For example, if the APO has the following, it'll fire off wasDisseminationWF:

inistrativeMetadata>

issemination>
<workflow id="wasDisseminationWF"/>
dissemination>
ministrativeMetadata>

It consists of 1 robot:


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.