ContentMine/tilburg

Name: tilburg

Owner: The ContentMine

Description: Extraction of data from Vector-based Funnel Plots in the scholarly literature

Created: 2017-06-09 12:05:15.0

Updated: 2017-06-28 10:25:01.0

Pushed: 2017-07-10 13:01:37.0

Homepage: null

Size: 74723

Language: Shell

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

tilburg

Extraction of data from Vector-based Funnel Plots in the scholarly literature

Overview

The main narrative (including diagrams) is at http://discuss.contentmine.org/t/extracting-data-from-tilburg-funnel-plot-diagrams/386/ . This is an OpenNotebook of all the work performed in the project. It aspires to Jean-Claude Bradley's maxim of “No insider knowledge”. Our intention is that anyone should be able to repeat the analyses (although this may require having to rebuild software releases, etc. or restore versions of the data.) Unless you are experienced in this it will probably be most useful to watch until the periodic analyses.

Note that the raw-material used is copyrighted by other parties and we use it under fair use. We assert that all derived data is uncopyrightable and will be publicly committed.

Materials

The raw data materials are from the sources:

  1. SVG files (ca 8) created from PDF using PDF2SVG by CHJH before the project started. We will retrieve the original PDF and commit and probablt rerun the PDF2SVG conversion.
  2. Openly-visble PDF files (ca 13) disovered by a Google search for “Funnel plot”. These were converted to SVG and manually edited to extract the funnel plot figures
  3. Production PDFs to be submitted by CHJH/Tilburg University.

The raw data will be hand annotated with a standard template (imitially tabular, which may evolve slightly as more data are added).

Methods
code

The “AMI” stack developed by PMR has been installed in ContentMine and edited to provide specific functionality for Funnel Plots. The top-level is norma , distributed as a Jar-with-dependencies with a number of norma commands to extract, transform and analyse data. A key development tool is org.xmlcml.norma.plot.ScatterTest. The main code to change during the project will be in https://github.com/contentmine/euclid, https://github.com/contentmine/svg and https://github.com/contentmine/norma. Unless you are familiar with the (extensive) code stack it will be best to use the jar, using the instructions in http://discuss.contentmine.org/t/extracting-data-from-tilburg-funnel-plot-diagrams/386/. We cannot at this stage give detailed help on how to use the code, this comes later.

evaluation

The raw data will be analyzed manually to give a ground truth for pointwise comparison of coordinates. This may be complete or sparse according to the data. This will be hosted here.

The extracted points will be compared manually or automatically against the ground truth. In addition there are systematic qualitatitative annotations.

operations

The detailed operations are recorded in the Discuss narrative http://discuss.contentmine.org/t/extracting-data-from-tilburg-funnel-plot-diagrams/386/.

evaluation of results

TBD


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.