CD2H gitForager

ContentMine/ami

Name: ami

Owner: The ContentMine

Description: null

Created: 2015-03-26 12:59:42.0

Updated: 2018-03-14 16:56:04.0

Pushed: 2017-07-02 08:30:34.0

Homepage: null

Size: 301415

Language: HTML

GitHub Committers

User	Most Recent Commit	# Commits

Other Committers

User	Email	Most Recent Commit	# Commits

README

AMI

AMI provides a generic infrastructure where plugins can search, index or transform structured documents on a high-through basis. The typical input is structured, normalized, tagged XHTML, possibly containing (or linked to) SVG and PNG files. The plugins are designed to analyse text or graphics or a combination according to the discipline.

Running

There are binaries provided with releases which enable running of ami's plugins; shell and batch scripts are provided to make running these easier. Detailed tutorials of these can be found in the ContentMine software tutorials.

Run on a CTree which contains scholarly.html files of the papers you are analysing. This can be made using Norma.

Schema

-<pluginname> --project <foldername> <plugin option> <options relating to plugin>
-gene --project zika --g.gene --g.type human

Building

package

After the build, shell scripts and batch files to run plugins are in target/appassembler/bin/. Further documentation is in docs/.

Plugins

AMI has a plugin architecture where each problem or community has its own plugin. Examples are “species”, “sequence”, “regex” and soon some chemistry.

It is often straightforward to develop text-based searches, and this is accessible to most committed scientists. Graphics is always harder and requires bespoke programming. Plugins have been developed for at least:

text targets

indexing text by regular expressions (regex).
Genbank IDs
PDB ids
farm-related / agronomy terms
chemical species (OSCAR)
computational phylogenetics
terms identifying Ebola and other haemorrhagic diseases

graphical targets

phylogenetic trees
chemical structures

The plugin architecture is moderately stable and it requires very little alteration to the codebase to add a new one (hopefully soon this can be done automatically by configuration files).

file structure

The input must be a QuickscrapeNorma directory (QSNorma). This must contain scholarly.html which is used for analysis. When a plugin is run, the output is to the results directory, wiwth a subdirectory for each plugin and a sub-subdirectory for each plugin option: Example:

http_www.foo_1_2/
    fulltext.xml
    fulltext.pdf
    scholarly.html
    results/
        words/
            frequency/
                results.xml
            lengths/
                results.xml
        regex/
            consort0/
                results.xml
            publication/
                results.xml

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.