histograph/uri-normalizer

Name: uri-normalizer

Owner: Histograph

Description: Histograph URI Normalizer

Created: 2015-06-24 12:39:47.0

Updated: 2016-01-13 00:33:32.0

Pushed: 2017-01-10 11:10:26.0

Homepage: null

Size: 17

Language: JavaScript

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Histograph URI Normalizer

Used by Histograph to define one set of identifiers to be used by Graphmalizer.

Graphmalizer is stupid (by design): when two identifiers are lexicographically equal (equal as character strings) they are considered to refer to the same thing.

Histograph is more flexible when it comes to specifying identifiers:

This project matches any histograph identifier string and normalizes it into a URI.

Uniform Resource Indentifiers

We have:

They are documented in

Lexical Equivalence in URNs

Lexical equivalence means: equal as character strings, just by looking at two, you can decide if they refer to the same thing.

RFC 2141: For various purposes such as caching, it's often desirable to determine if two URNs are the same without resolving them.

Example: the following URNs are lexically equivalent.

URN:foo:a123,456
urn:foo:a123,456
urn:FOO:a123,456
Functional Equivalence

Again, quoting RFC 2141.

RFC 2141: Functional equivalence is determined by practice within a given namespace and managed by resolvers for that namespeace. Thus, it is beyond the scope of this document. Namespace registration must include guidance on how to determine functional equivalence for that namespace, i.e. when two URNs are the identical within a namespace.

Fictional example:

urn:hgconcept:bag/123,tgn/234,geonames/345
urn:hgconcept:geonames/345,bag/456

These might refer to the same concepts.

Summary

Known URLs are converted into canonical form URNs with NID hg

http://vocab.getty.edu/tgn/7006952 ~> urn:hg:tgn:7006952
http://sws.geonames.org/2758064/   ~> urn:hg:geonames:2758064

URNs are left untouched (if canonical form is known, convert to that):

urn:hg:geonames:2758064            ~> urn:hg:geonames:2758064
urn:ietf:rfc:2141                  ~> urn:ietf:rfc:2141

HGIDs within a dataset foo are expandend to URNs with NID hgid

12345-nl                           ~> urn:hgid:foo:12345-nl
bar/45678901                       ~> urn:hgid:bar:45678901

Reverse (resolving)

urn:hg:geonames:2758064            ~> http://sws.geonames.org/2758064/

Etc.

namespaces.js contains a set of default namespaces.

See also:

Identifier strings are matched according to the following regular expressions

atching strings look like an URI to use, based on RFC2141
SCHEME = /^[a-zA-Z][a-zA-Z0-9+-\.]*:$/

atch `foo/123` HGID's
HGID = /^[a-zA-Z0-9\.+-_]+\/[a-zA-Z0-9\.+-_]+$/

lleß Andere
ID = /^[a-zA-Z0-9\.+-_]+$/
Usage

First:

npm install histograph/uri-normalizer

Just do the right thing:

n = require('histograph-uri-normalizer').normalize;

ole.log(n('http://sws.geonames.org/2758064/'))
> urn:hg:geonames:2758064

on't need to, but might as well pass dataset identifier
ole.log(n('foo/123', 'bar'))
> urn:hgid:foo:123

eed to pass dataset identifier
ole.log(n('123', 'bar'))
> urn:hgid:bar:123

Or use the more specific methods:

normalizer = require('histograph-uri-normalizer');

urn = normalizer.URLtoURN('http://sws.geonames.org/2758064/');
ole.log(urn); // contains 'urn:hg:geonames:2758064'
API
normalizer.normalize(s, nid)

Tries to detect if you pass an URI, local HGID or global HGID. Then does the right thing to normalize it.

It uses all namespaces to convert s if it's a URI.

normalizer.URLtoURN(url, [nid])

Tries to normalize url, using all available namespaces. If nid is specified, only uses that namespace.

normalizer.URNtoURL(urn)

Resolves urn to URL.

normalizer.addNamespace(nid, namespace)

Adds new namespace nid to available namespaces. A new namespace must define a string baseUrl, and two functions URLtoURN(url) and URNtoURL(nid, nss).

Example:

newNamespace = {
seUrl: 'http://sws.geonames.org/',

LtoURN: function(url) {
var match = /.*?(\d+).*/.exec(url);
return 'urn:geonames:' + match[1];


NtoURL: function(nid, nss) {
return this.baseUrl + nss + '/';



alizer.addNamespace('geonames', newNamespace)
normalizer.removeNamespace(nid)

Removes namespace nid from namespaces list.

Copyright © 2015 Waag Society.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.