Name: rhizome
Owner: Lawrence Livermore National Laboratory
Description: null
Created: 2012-01-12 18:49:53.0
Updated: 2018-01-11 17:56:51.0
Pushed: 2016-03-07 18:33:41.0
Size: 27
Language: Clojure
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
This software does the pre-processing necessary to use the iris latent topic feedback plugin for information retrieval. At a high-level, the intended workflow is:
This system is based on the KDD paper:
Latent Topic Feedback for Information Retrieval David Andrzejewski and David Buttler. Proceedings of the 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2011)
The iris code was developed by Kevin R. Lawrence, and the rhizome pre-processing module was written by David Andrzejewski.
This code can be called from the command-line, an example use case is given in runme.sh
The following operations are used to populate a running MongoDB instance with the information Iris will need to function:
The following command-line options (with defaults in parentheses) allow the user to specify parameters of the MongoDB instance, the Solr index, and the LDA topic model:
ohost (localhost) = MongoDB host
oport (27017) = MongoDB port
oname (topics) = MongoDB database name
host (localhost) = Solr index address
port (8983) = Solr index port
fields (title,text) = Comma-separated list of Solr fields to model
title (nil) = Solr field to use as document names
low (0) = Low end of stoplist count thresholds to print out for 'count'
high (100) = High end of stoplist thresholds to print out for 'count'
thresh (50) = Filter out rare words occurring < stopthresh times
00) = Number of latent topics to use
p (1000) = Number of MCMC samples to take
This code is licensed under the terms of the GNU GPL license, the the LICENSE.txt file for full details.
Copyright (c) 2012, Lawrence Livermore National Security, LLC. Produced at the Lawrence Livermore National Laboratory. Written by David Andrzejewski david.andrzej@gmail.com
LLNL-CODE-521811 All rights reserved. This file is part of IRIS.