DataONEorg/metasanity

Name: metasanity

Owner: DataONE

Description: A bare bones metadata validation tool

Created: 2017-06-22 23:55:57.0

Updated: 2017-06-26 19:35:07.0

Pushed: 2017-08-23 19:15:37.0

Homepage: null

Size: 68

Language: Java

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Metasanity

No-frills schema aware metadata validator.

Attempts to validate content based on the local copies of metadata schemas used by Coordinating Nodes to emulate the validation process used during the internal create operation.

Build
  1. Clone, fork, or download a copy of this repo
  2. cd to the metasanity folder
  3. Run mvn package

Result if all goes as expected should be target/metasanity-X.Y-SNAPSHOT.jar

Use

First, populate the local schema folder with a copy of the schemas from a Coordinating Node (requires shell access to a CN):

c -avz -e "ssh" cn.dataone.org:/var/lib/tomcat7/webapps/metacat/schema .

Run metasanity from the commandline, for example:

 -jar target/metasanity-1.0-SNAPSHOT.jar samples/iso_01.xml

The metasanity expects an xml catalog file “schemas.xml” to be in the working directory. Use -c to specify a different catalog.

The output from the tool is something like:

va -jar target/metasanity-1.0-SNAPSHOT.jar samples/iso_01.xml
ing: samples/iso_01.xml
ment is valid.

or:

va -jar target/metasanity-1.0-SNAPSHOT.jar samples/iso_02_cn-invalid.xml
ing: samples/iso_02_cn-invalid.xml
r:
ublic ID: null
ystem ID: file:///Users/vieglais/Documents/Projects/DataONE_PhaseII/Projects/NetBeans/metasanity/samples/iso_02_cn-invalid.xml
ine number: 632
olumn number: 21
essage: cvc-complex-type.2.4.a: Invalid content was found starting with element 'gmd:taxonomy'. One of '{"http://www.isotc211.org/2005/gmd":aggregationInfo, "http://www.isotc211.org/2005/gmd":spatialRepresentationType, "http://www.isotc211.org/2005/gmd":spatialResolution, "http://www.isotc211.org/2005/gmd":language}' is expected.

ment is not valid. Please review issues noted above.
XML Catalog

Note that metasanity uses an XMLCatalog and so differs from the implementation on the DataONE CNs.

Three examples of XML Catalog files are provided:

Example of ISOTC211 from NOAA valid for the gmd-noaa schema variant:

 -jar target/metasanity-1.0-SNAPSHOT.jar -c isotc211-noaa-catalog.xml samples/iso_01.xml
23, 2017 3:03:54 PM org.dataone.metasanity.MetaSanity main
: Using catalog: isotc211-noaa-catalog.xml
23, 2017 3:03:54 PM org.dataone.metasanity.MetaSanity main
: Parsing: samples/iso_01.xml
23, 2017 3:03:55 PM org.dataone.metasanity.MetaSanity main
: Document is valid.

And invalid for the plain ISOTC211 variant:

 -jar target/metasanity-1.0-SNAPSHOT.jar -c isotc211-catalog.xml samples/iso_01.xml
23, 2017 3:10:15 PM org.dataone.metasanity.MetaSanity main
: Using catalog: isotc211-catalog.xml
23, 2017 3:10:15 PM org.dataone.metasanity.MetaSanity main
: Parsing: samples/iso_01.xml
23, 2017 3:10:22 PM org.dataone.metasanity.MetaSanity$ValidationErrorHandler error
RE: cvc-complex-type.2.4.a: Invalid content was found starting with element 'gmx:Anchor'. One of '{"http://www.isotc211.org/2005/gco":CharacterString}' is expected.
blic ID: null
stem ID: file:///Users/vieglais/Documents/Projects/DataONE_PhaseII/Projects/NetBeans/metasanity/samples/iso_01.xml
ne number: 136
lumn number: 167



23, 2017 3:10:22 PM org.dataone.metasanity.MetaSanity$ValidationErrorHandler error
RE: cvc-complex-type.2.4.a: Invalid content was found starting with element 'gmx:Anchor'. One of '{"http://www.isotc211.org/2005/gco":CharacterString}' is expected.
blic ID: null
stem ID: file:///Users/vieglais/Documents/Projects/DataONE_PhaseII/Projects/NetBeans/metasanity/samples/iso_01.xml
ne number: 884
lumn number: 125
23, 2017 3:10:22 PM org.dataone.metasanity.MetaSanity main
ING:
ment is not valid with 70 issues. Please review issues noted above.
Reference

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.