sul-dlss/discovery-indexer

Name: discovery-indexer

Owner: Stanford University Digital Library

Description: This library manages the core operations for the discovery indexing such as reading PURL xml, mapping to the solr document, and writing to solr core.

Created: 2015-03-09 18:25:25.0

Updated: 2016-02-04 19:38:24.0

Pushed: 2017-03-02 23:41:05.0

Homepage: null

Size: 183

Language: Ruby

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Build Status Coverage Status Dependency Status Gem Version

discovery_indexer gem provides the core features required to perform solr indexing from PURL for Stanford University Library digital library websites and Searchworks.

Reading XML files

The reader component is responsible for reading both the full public XML and the MODs XML from PURL pages.

Mapping

The GeneralMapper interface and its implementation, IndexMapper, map the input XML (public and MODs) from the reader to a Solr doc hash. There are two methods to build a specialized indexer for a specific Solr index (such as Searchworks or Revs):

Inherit from GeneralMapper
class SpecializedMapper < GeneralMapper
  def initialize(druid, modsxml, purlxml, collection_names={}) # you are provided with mods and purl xml
    super druid, modsxml, purlxml, collection_names
  end  

  def map()
    ...   # you generate a full solr doc hash
    return solr_doc hash
  end
end
Inherits from IndexMapper

In this case, you will have a solr_doc hash starting point with common solr fields. You can decorate the hash to further add/remove fields as necessary.

class SpecializedMapper < IndexMapper
  def initialize(druid, modsxml, purlxml, collection_names={})
    super druid, modsxml, purlxml, collection_names
  end  

  def map()
    solr_doc = super.map()
    # add remove from solr_doc hash as needed for your app
    return new_solr_doc
  end
end
Writing to SOLR

The gem will take of writing the solr doc to a specific solr core URL that is defined in a list of targets with its configuration.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.