LANL-Bioinformatics/rMoore

Name: rMoore

Owner: LANL-Bioinformatics

Description: Moore genome citation tracker

Forked from: cstubben/rMoore

Created: 2015-12-10 16:08:26.0

Updated: 2017-10-31 15:48:45.0

Pushed: 2017-11-22 21:31:06.0

Homepage:

Size: 30

Language: R

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

The rMoore package is a wrapper to the euPMC package and finds publications mentioning a genome project funded by the Gordon and Betty Moore Foundation. The main function citations requires a Moore project table with three columns containing valid Europe PMC search queries. An example table from GBMF521 is included in the package or you can load one using read.xls in the gdata package.

(mg)
- read.xls("Moore_521.xlsx", stringsAsFactors=FALSE)

The project table should contain a bioproject title, ID, accessions, PubMed ID of the genome publication, synonyms and three search columns.

[5,])

         5                                                                       
ect      "Algoriphagus machipongonensis PR1"                                     
         "PRJNA18947"                                                            
         "AAXU"                                                                  
ank      "CM001023"                                                              
eq       "NZ_CM001023"                                                           
         ""                                                                      
ed       "21183675"                                                              
nyms     "Algoriphagus sp. PR1"                                                  
stigator "Nicole King"                                                           
s        ""                                                                      
s        "cites:21183675_MED"                                                    
ords     "((Algoriphagus machipongonensis PR1) OR (Algoriphagus sp. PR1)) genome"
         "AAXU0* OR CM001023 OR NZ_CM001023"                                     

The cites, keywords and accs columns contain search queries for Europe PMC and are usually generated automatically by using Excel functions. If needed, these can be replaced by specific queries to narrow or broaden searches.

G6="", "", CONCATENATE("cites:", G6, "_MED"))
H6="", CONCATENATE(A6, " genome"), CONCATENATE("((", A6, ") OR (", H6, ")) genome"))
C6="",IF(D6="","",CONCATENATE(D6," OR ",E6)),IF(D6="",CONCATENATE(C6,"0*"), CONCATENATE(C6,"0* OR ",D6," OR ",E6)))

The citations functions uses the Moore table as input and searches for the genome paper, citations, keywords and accession numbers. For each project, the results are combined into a single table with an additional column containing the search criteria used to find the paper, where *=genome paper, G=cites Genome paper, K=matches Keywords, and A=mentions Accession. Mutliple projects are combined into a list of data.frames.

 citations(mg[5:6,])

1. Algoriphagus machipongonensis PR1
ching EXT_ID:21183675
sult
ching cites:21183675_MED
sults
ching ((Algoriphagus machipongonensis PR1) OR (Algoriphagus sp. PR1)) genome
esults
ching AAXU0* OR CM001023 OR NZ_CM001023
sults
2. alpha proteobacterium BAL199
ching alpha proteobacterium BAL199 genome
esults
ching ABHC0*
sults

ly(x, nrow)

riphagus machipongonensis PR1      alpha proteobacterium BAL199 
                           27                                12 

e(x[[1]]$search)


  G  GK   K  KA 
  3   4  18   1 

This code returns all 177 project citations.

<- citations(mg)

Additional functions will be included to output publications lists with the search criteria, write summary tables to Excel or Javascript datatables, group publications by project or create large xts objects for plotting dygraphs.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.