LLNL/rhizome

Name: rhizome

Owner: Lawrence Livermore National Laboratory

Description: null

Created: 2012-01-12 18:49:53.0

Updated: 2018-01-11 17:56:51.0

Pushed: 2016-03-07 18:33:41.0

Homepage:

Size: 27

Language: Clojure

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

rhizome

This software does the pre-processing necessary to use the iris latent topic feedback plugin for information retrieval. At a high-level, the intended workflow is:

  1. User ingests corpus into Solr
  2. Run rhizome against Solr to populate a MongoDB instance with LDA topics
  3. Run iris as part of your Solr-based document search system

This system is based on the KDD paper:

Latent Topic Feedback for Information Retrieval David Andrzejewski and David Buttler. Proceedings of the 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2011)

The iris code was developed by Kevin R. Lawrence, and the rhizome pre-processing module was written by David Andrzejewski.

Usage

This code can be called from the command-line, an example use case is given in runme.sh

The following operations are used to populate a running MongoDB instance with the information Iris will need to function:

The following command-line options (with defaults in parentheses) allow the user to specify parameters of the MongoDB instance, the Solr index, and the LDA topic model:

ohost (localhost) = MongoDB host 
oport (27017) = MongoDB port 
oname (topics) = MongoDB database name 
host (localhost) = Solr index address 
port (8983) = Solr index port 
fields (title,text) = Comma-separated list of Solr fields to model 
title (nil) = Solr field to use as document names
low (0) = Low end of stoplist count thresholds to print out for 'count'
high (100) = High end of stoplist thresholds to print out for 'count'
thresh (50) = Filter out rare words occurring < stopthresh times 
00) = Number of latent topics to use 
p (1000) = Number of MCMC samples to take 
License

This code is licensed under the terms of the GNU GPL license, the the LICENSE.txt file for full details.

Copyright (c) 2012, Lawrence Livermore National Security, LLC. Produced at the Lawrence Livermore National Laboratory. Written by David Andrzejewski david.andrzej@gmail.com

LLNL-CODE-521811 All rights reserved. This file is part of IRIS.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.