The acknowledgement sections of journal articles are a rich source of contributions to research that do not rise to the level of authorship. This application is an initial deployment of a tool supporting exploration of PubMed Central acknowledgements. We have two primary goals at this stage. The first is to serve as a source of expertise for CTSAsearch regarding individuals key to research success that typically are not coauthors or coinvestigators. The second is as a source of ground truth for coathor contribution to feed into the CD2H CTS Personas project and the Science of Translational Science Platform.
Harvesters run roughly daily to download the latest releases from the Open Access component of PubMed Central. Each publication release is distributed as a compressed tar file. The harvester scans the tar file, generating a table of contents, and extracting the XML version of the paper. The XML is then scanned for an acknowledgment section, which is extracted, separated into sentences and parsed with the Stanford NLP parser. The Iowa extraction framework then extracts entities and populates the database illustrated below. While this is very much still a work in progress (particularly regarding the current set of extraction rules!), it still provides useful data at an increasingly sophisticated level as the extraction rule set is improved.