Name: purl-fetcher
Owner: Stanford University Digital Library
Description: Web services that query PURL to return info needed for indexing or other purposes
Created: 2016-02-17 00:57:41.0
Updated: 2018-05-22 20:29:42.0
Pushed: 2018-05-22 20:29:42.0
Size: 44890
Language: Ruby
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
A web service app that queries PURL to return info needed for updating a database that can be queried via REST APIs.
clone https://github.com/sul-dlss/purl-fetcher.git
url-fetcher
le install
db:migrate
db:migrate RAILS_ENV=test
s server
There are three log files:
indexing.log
- items that are being saved (added or deleted)[environment].log
- Rails loggeraccess.log
and error.log
from Apache - traffic to the HTTP APIsle exec rake
/purls
GET /purls
Purl Index route
The /purls
endpoint provides information about public PURL documents.
Name | Located In | Description | Required | Schema | Default
—- | ———- | ———– | ——– | —— | ——-
object_type
| query | limit requests to a specific object_type
| No | string | null
membership
| query | limit requests by membership type, for instance items with no membership (collection) | No | string accepted values: none
, collection
| null
status
| query | limit requests by status of object (deleted
, public
) | No | string | null
target
| query | limit requests by release tag targets (SearchWorks
, Revs
) case sensitive | No | string | null
page
| query | request a specific page of results | No | integer | 1
per_page
| query | Limit the number of results per page | No | integer (1 - 10000) | 100
version
| header | Version of the API request eg(version=1
) | No | integer | 1
urls": [
{
"druid": "druid:ee1111ff2222",
"published_at": "2013-01-01T00:00:00.000Z",
"deleted_at": "2016-01-03T00:00:00.000Z",
"object_type": "set",
"catkey": "",
"title": "Some test object number 4",
"collections": [
"druid:oo000oo0002"
],
"true_targets": [
"SearchWorksPreview"
]
},
{
"druid": "druid:ff1111gg2222",
"published_at": "2013-01-01T00:00:00.000Z",
"deleted_at": "2014-01-01T00:00:00.000Z",
"object_type": "collection",
"catkey": "",
"title": "Some test object number 5",
"collections": [],
"true_targets": [
"SearchWorksPreview"
]
}
ages": {
"current_page": 1,
"next_page": null,
"prev_page": null,
"total_pages": 1,
"per_page": 100,
"offset_value": 0,
"first_page?": true,
"last_page?": true
/purls/:druid
GET /purls/:druid
Purl Document Show
The /purls/:druid
endpoint provides information about a specifc PURL document.
Name | Located In | Description | Required | Schema | Default
—- | ———- | ———– | ——– | —— | ——-
druid
| url | Druid of a specific PURL | Yes | string eg(druid:cc1111dd2222
) | null
version
| header | Version of the API request eg(version=1
) | No | integer | 1
ruid": "druid:cc1111dd2222",
ublished_at": "2016-01-01T00:00:00.000Z",
eleted_at": "2016-01-02T00:00:00.000Z",
bject_type": "item",
atkey": "567",
itle": "Some test object number 2",
ollections": [
"druid:oo000oo0002"
rue_targets": [
"SearchWorksPreview"
alse_targets": [
"SearchWorks"
PATCH /purls/:druid
Purl Document Update
The PATCH /purls/:druid
endpoint provides the ability to update PURL document from public xml.
Name | Located In | Description | Required | Schema | Default
—- | ———- | ———– | ——– | —— | ——-
druid
| url | Druid of a specific PURL | Yes | string eg(druid:cc1111dd2222
) | null
version
| header | Version of the API request eg(version=1
) | No | integer | 1
/docs/changes
GET /docs/changes
Purl Document Changes
The /docs/changes
endpoint provides information about public PURL documents that have been changed, their release tag information and also collection association.
Name | Located In | Description | Required | Schema | Default
—- | ———- | ———– | ——– | —— | ——-
first_modified
| query | Limit response by a beginning datetime | No | datetime in iso8601 | earliest possible date
last_modified
| query | Limit response by an ending datetime| No | datetime in iso8601 | current time
page
| query | request a specific page of results | No | integer | 1
per_page
| query | Limit the number of results per page | No | integer (1 - 10000) | 100
version
| header | Version of the API request eg(version=1
) | No | integer | 1
hanges": [
{
"druid": "druid:dd111ee2222",
"latest_change": "2014-01-01T00:00:00Z",
"true_targets": [
"SearchWorksPreview"
],
"collections": [
"druid:oo000oo0001"
]
},
{
"druid": "druid:bb111cc2222",
"latest_change": "2015-01-01T00:00:00Z",
"true_targets": [
"SearchWorks",
"Revs",
"SearchWorksPreview"
],
"collections": [
"druid:oo000oo0001",
"druid:oo000oo0002"
]
},
{
"druid": "druid:aa111bb2222",
"latest_change": "2016-06-06T00:00:00Z",
"true_targets": [
"SearchWorksPreview"
]
},
ages": {
"current_page": 1,
"next_page": null,
"prev_page": null,
"total_pages": 1,
"per_page": 100,
"offset_value": 0,
"first_page?": true,
"last_page?": true
/docs/deletes
GET /docs/deletes
Purl Document Deletes
The /docs/deletes
endpoint provides information about public PURL documents that have been deleted.
Name | Located In | Description | Required | Schema | Default
—- | ———- | ———– | ——– | —— | ——-
first_modified
| query | Limit response by a beginning datetime | No | datetime in iso8601 | earliest possible date
last_modified
| query | Limit response by an ending datetime| No | datetime in iso8601 | current time
page
| query | request a specific page of results | No | integer | 1
per_page
| query | Limit the number of results per page | No | integer (1 - 10000) | 100
version
| header | Version of the API request eg(version=1
) | No | integer | 1
eletes": [
{
"druid": "druid:ee111ff2222",
"latest_change": "2014-01-01T00:00:00Z"
},
{
"druid": "druid:ff111gg2222",
"latest_change": "2014-01-01T00:00:00Z"
},
{
"druid": "druid:cc111dd2222",
"latest_change": "2016-01-02T00:00:00Z"
}
ages": {
"current_page": 1,
"next_page": null,
"prev_page": null,
"total_pages": 1,
"per_page": 100,
"offset_value": 0,
"first_page?": true,
"last_page?": true
/collections
GET /collections
Collections in PURL
The /collections
endpoint provides a list of collections (with druids, catkeys, and release targets)
Name | Located In | Description | Required | Schema | Default
—- | ———- | ———– | ——– | —— | ——-
page
| query | request a specific page of results | No | integer | 1
per_page
| query | Limit the number of results per page | No | integer (1 - 10000) | 100
version
| header | Version of the API request eg(version=1
) | No | integer | 1
ollections": [
{
"druid": "druid:ff111gg2222",
"catkey": "",
"true_targets": [
"SearchWorksPreview"
]
}
ages": {
"current_page": 1,
"next_page": null,
"prev_page": null,
"total_pages": 1,
"per_page": 100,
"offset_value": 0,
"first_page?": true,
"last_page?": true
/collections/:druid
GET /collections/:druid
Provides information about a single collection
The /collections/:id
endpoint provides information about a single collection.
Name | Located In | Description | Required | Schema | Default
—- | ———- | ———– | ——– | —— | ——-
druid
| url | Druid of a specific collection | Yes | string eg(druid:cc1111dd2222
) | null
page
| query | request a specific page of results | No | integer | 1
per_page
| query | Limit the number of results per page | No | integer (1 - 10000) | 100
version
| header | Version of the API request eg(version=1
) | No | integer | 1
ruid": "druid:ff111gg2222",
ublished_at": "2013-01-01T00:00:00.000Z",
eleted_at": "2014-01-01T00:00:00.000Z",
bject_type": "collection",
atkey": "",
itle": "Some test object number 5 (a collection)",
ollections": [],
rue_targets": [
"SearchWorksPreview"
/collections/:druid/purls
GET /collections/:druid/purls
Collection Purls route
The /collections/:druid/purls
endpoint a listing of Purls for a specific collection.
Name | Located In | Description | Required | Schema | Default
—- | ———- | ———– | ——– | —— | ——-
druid
| url | Druid of a specific collection | Yes | string eg(druid:cc1111dd2222
) | null
page
| query | request a specific page of results | No | integer | 1
per_page
| query | Limit the number of results per page | No | integer (1 - 10000) | 100
version
| header | Version of the API request eg(version=1
) | No | integer | 1
urls": [
{
"druid": "druid:ee111ff2222",
"published_at": "2013-01-01T00:00:00.000Z",
"deleted_at": "2016-01-03T00:00:00.000Z",
"object_type": "set",
"catkey": "",
"title": "Some test object number 4",
"collections": [
"druid:ff111gg2222"
],
"true_targets": [
"SearchWorksPreview"
]
},
{
"druid": "druid:cc111dd2222",
"published_at": "2016-01-01T00:00:00.000Z",
"deleted_at": "2016-01-02T00:00:00.000Z",
"object_type": "item",
"catkey": "567",
"title": "Some test object number 2",
"collections": [
"druid:ff111gg2222"
],
"true_targets": [
"SearchWorksPreview"
],
"false_targets": [
"SearchWorks"
]
}
ages": {
"current_page": 1,
"next_page": null,
"prev_page": null,
"total_pages": 1,
"per_page": 100,
"offset_value": 0,
"first_page?": true,
"last_page?": true
The API's internals use an ActiveRecord data model to manage various information
about published PURLs. This model consists of Purl
, Collection
, and
ReleaseTag
active records. See app/models/
and db/schema.rb
for details.
This approach provides administrators a couple ways to explore the data outside of the API.
With Rails' runner
, you can query the database using ActiveRecord. For example, running the Ruby in script/reports/summary.rb
using:
S_ENV=environment bundle exec rails runner script/reports/summary.rb
produces output like this:
ary report as of 2016-08-24 09:52:49 -0700 on purl-fetcher-dev.stanford.edu
s: 193960
ted PURLs: 1
ished PURLs: 193959
ished PURLs in last week: 0
ased to SearchWorks: 5
With Rails' dbconsole
, you can query the database using SQL. For example, running the SQL in script/reports/summary.sql
using:
S_ENV=environment bundle exec rails dbconsole -p < script/reports/summary.sql
produces output like this:
s 193960
ted PURLs 1
ished PURLs 193959
ished this year 9
ased to SearchWorks 5