ropensci/taxize

Name: taxize

Owner: rOpenSci

Description: A taxonomic toolbelt for R (https://ropensci.github.io/taxize/)

Created: 2011-05-19 15:05:33.0

Updated: 2017-12-22 11:49:09.0

Pushed: 2018-01-05 19:54:54.0

Homepage: https://ropensci.org/tutorials/taxize.html

Size: 26880

Language: R

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

taxize

Build Status Build status codecov.io rstudio mirror downloads cran version

taxize allows users to search over many taxonomic data sources for species names (scientific and common) and download up and downstream taxonomic hierarchical information - among other things.

The taxize tutorial is can be found at https://ropensci.org/tutorials/taxize.html

The functions in the package that hit a specific API have a prefix and suffix separated by an underscore. They follow the format of service_whatitdoes. For example, gnr_resolve uses the Global Names Resolver API to resolve species names. General functions in the package that don't hit a specific API don't have two words separated by an underscore, e.g., classification.

You need API keys for Encyclopedia of Life (EOL), Tropicos, IUCN, and NatureServe.

SOAP

Note that a few data sources require SOAP web services, which are difficult to support in R across all operating systems. These include: Pan-European Species directories Infrastructure and Mycobank. Data sources that use SOAP web services have been moved to taxizesoap at https://github.com/ropensci/taxizesoap.

Currently implemented in taxize
Souce Function prefix API Docs API key
Encylopedia of Life eol link link
Taxonomic Name Resolution Service tnrs "api.phylotastic.org/tnrs" none
Integrated Taxonomic Information Service itis link none
Global Names Resolver gnr link none
Global Names Index gni link none
IUCN Red List iucn link link
Tropicos tp link link
Theplantlist dot org tpl ** none
Catalogue of Life col link none
National Center for Biotechnology Information ncbi none none
CANADENSYS Vascan name search API vascan link none
International Plant Names Index (IPNI) ipni link none
Barcode of Life Data Systems (BOLD) bold link none
National Biodiversity Network (UK) nbn link none
Index Fungorum fg link none
EU BON eubon link none
Index of Names (ION) ion link none
Open Tree of Life (TOL) tol link none
World Register of Marine Species (WoRMS) worms link none
NatureServe natserv link link
Wikipedia wiki link none

**: There are none! We suggest using TPL and TPLck functions in the taxonstand package. We provide two functions to get bullk data: tpl_families and tpl_get.

***: There are none! The function scrapes the web directly.

May be in taxize in the future…

See the newdatasource tag in the issue tracker

Tutorial

For more examples see the tutorial

Installation
Stable version from CRAN
all.packages("taxize")
Development version from GitHub

Windows users install Rtools first.

all.packages("devtools")
ools::install_github("ropensci/taxize")

ary('taxize')
Get unique taxonomic identifier from NCBI

Alot of taxize revolves around taxonomic identifiers. Because, as you know, names can be a mess (misspelled, synonyms, etc.), it's better to get an identifier that a particular data sources knows about, then we can move forth acquiring more fun taxonomic data.

 <- get_uid(c("Chironomus riparius", "Chaetopteryx"))
Retrieve classifications

Classifications - think of a species, then all the taxonomic ranks up from that species, like genus, family, order, class, kingdom.

<- classification(uids)
ly(out, head)
`315576`
               name         rank     id
 cellular organisms      no rank 131567
          Eukaryota superkingdom   2759
       Opisthokonta      no rank  33154
            Metazoa      kingdom  33208
          Eumetazoa      no rank   6072
          Bilateria      no rank  33213

`492549`
               name         rank     id
 cellular organisms      no rank 131567
          Eukaryota superkingdom   2759
       Opisthokonta      no rank  33154
            Metazoa      kingdom  33208
          Eumetazoa      no rank   6072
          Bilateria      no rank  33213
Immediate children

Get immediate children of Salmo. In this case, Salmo is a genus, so this gives species within the genus.

dren("Salmo", db = 'ncbi')
Salmo
  childtaxa_id                   childtaxa_name childtaxa_rank
       1509524  Salmo marmoratus x Salmo trutta        species
       1484545 Salmo cf. cenerinus BOLD:AAB3872        species
       1483130               Salmo zrmanjaensis        species
       1483129               Salmo visovacensis        species
       1483128                Salmo rhodanensis        species
       1483127                 Salmo pellegrini        species
       1483126                     Salmo opimus        species
       1483125                Salmo macedonicus        species
       1483124                Salmo lourosensis        species
0      1483123                   Salmo labecula        species
1      1483122                  Salmo farioides        species
2      1483121                      Salmo chilo        species
3      1483120                     Salmo cettii        species
4      1483119                  Salmo cenerinus        species
5      1483118                   Salmo aphelios        species
6      1483117                    Salmo akairos        species
7      1201173               Salmo peristericus        species
8      1035833                   Salmo ischchan        species
9       700588                     Salmo labrax        species
0       237411              Salmo obtusirostris        species
1       235141              Salmo platycephalus        species
2       234793                    Salmo letnica        species
3        62065                  Salmo ohridanus        species
4        33518                 Salmo marmoratus        species
5        33516                    Salmo fibreni        species
6        33515                     Salmo carpio        species
7         8032                     Salmo trutta        species
8         8030                      Salmo salar        species

ttr(,"class")
1] "children"
ttr(,"db")
1] "ncbi"
Downstream children to a rank

Get all species in the genus Apis

stream(as.tsn(154395), db = 'itis', downto = 'species', verbose = FALSE)
`154395`
    tsn parentname parenttsn          taxonname rankid rankname
 154396       Apis    154395     Apis mellifera    220  species
 763550       Apis    154395 Apis andreniformis    220  species
 763551       Apis    154395        Apis cerana    220  species
 763552       Apis    154395       Apis dorsata    220  species
 763553       Apis    154395        Apis florea    220  species
 763554       Apis    154395 Apis koschevnikovi    220  species
 763555       Apis    154395   Apis nigrocincta    220  species

ttr(,"class")
1] "downstream"
ttr(,"db")
1] "itis"
Upstream taxa

Get all genera up from the species Pinus contorta (this includes the genus of the species, and its co-genera within the same family).

ream("Pinus contorta", db = 'itis', upto = 'Genus', verbose=FALSE)
    tsn                        target
 183327                Pinus contorta
 183332 Pinus contorta ssp. bolanderi
 822698  Pinus contorta ssp. contorta
 183329 Pinus contorta ssp. latifolia
 183330 Pinus contorta ssp. murrayana
 529672 Pinus contorta var. bolanderi
 183328  Pinus contorta var. contorta
 529673 Pinus contorta var. latifolia
 529674 Pinus contorta var. murrayana
                                                      commonNames
               scrub pine,shore pine,tamarack pine,lodgepole pine
                                            Bolander's beach pine
                                                               NA
                         black pine,Rocky Mountain lodgepole pine
                              tamarack pine,Sierra lodgepole pine
                                              Bolander beach pine
                  coast pine,lodgepole pine,beach pine,shore pine
 tall lodgepole pine,lodgepole pine,Rocky Mountain lodgepole pine
      Murray's lodgepole pine,Sierra lodgepole pine,tamarack pine
    nameUsage
     accepted
 not accepted
 not accepted
 not accepted
 not accepted
     accepted
     accepted
     accepted
     accepted
inus contorta 
           NA 
ttr(,"class")
1] "upstream"
ttr(,"db")
1] "itis"
Get synonyms
nyms("Acer drummondii", db="itis")
    tsn             target commonNames    nameUsage
 183671    Acer drummondii          NA not accepted
 183672 Rufacer drummondii          NA not accepted
`Acer drummondii`
1] NA

ttr(,"class")
1] "synonyms"
ttr(,"db")
1] "itis"
Get taxonomic IDs from many sources
ids(names="Salvelinus fontinalis", db = c('itis', 'ncbi'), verbose=FALSE)
itis
alvelinus fontinalis 
            "162003" 
ttr(,"match")
1] "found"
ttr(,"multiple_matches")
1] FALSE
ttr(,"pattern_match")
1] FALSE
ttr(,"uri")
1] "http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=162003"
ttr(,"class")
1] "tsn"

ncbi
alvelinus fontinalis 
              "8038" 
ttr(,"class")
1] "uid"
ttr(,"match")
1] "found"
ttr(,"multiple_matches")
1] FALSE
ttr(,"pattern_match")
1] FALSE
ttr(,"uri")
1] "https://www.ncbi.nlm.nih.gov/taxonomy/8038"

ttr(,"class")
1] "ids"

You can limit to certain rows when getting ids in any get_*() functions

ids(names="Poa annua", db = "gbif", rows=1)
gbif
oa annua 
2704179" 
ttr(,"class")
1] "gbifid"
ttr(,"match")
1] "found"
ttr(,"multiple_matches")
1] TRUE
ttr(,"pattern_match")
1] FALSE
ttr(,"uri")
1] "http://www.gbif.org/species/2704179"

ttr(,"class")
1] "ids"

Furthermore, you can just back all ids if that's your jam with the get_*_() functions (all get_*() functions with additional _ underscore at end of function name)

ids_(c("Chironomus riparius", "Pinus contorta"), db = 'nbn', rows=1:3)
nbn
nbn$`Chironomus riparius`
             guid             scientificName    rank taxonomicStatus
 NBNSYS0000027573        Chironomus riparius species        accepted
 NHMSYS0000864966 Damaeus (Damaeus) riparius species        accepted
 NHMSYS0021059238      Rhizoclonium riparium species        accepted

nbn$`Pinus contorta`
             guid                scientificName    rank taxonomicStatus
 NBNSYS0000004786                Pinus contorta species        accepted
 NHMSYS0000494858 Pinus contorta var. murrayana variety        accepted
 NHMSYS0000494848  Pinus contorta var. contorta variety        accepted


ttr(,"class")
1] "ids"
Common names from scientific names
comm('Helianthus annuus', db = 'itis')
    tsn                              target
  36616                   Helianthus annuus
 525928      Helianthus annuus ssp. jaegeri
 525929 Helianthus annuus ssp. lenticularis
 525930      Helianthus annuus ssp. texanus
 536095 Helianthus annuus var. lenticularis
 536096  Helianthus annuus var. macrocarpus
 536097      Helianthus annuus var. texanus
                                                commonNames    nameUsage
 annual sunflower,sunflower,wild sunflower,common sunflower     accepted
                                                         NA not accepted
                                                         NA not accepted
                                                         NA not accepted
                                                         NA not accepted
                                                         NA not accepted
                                                         NA not accepted
`Helianthus annuus`
1] NA
Scientific names from common names
2sci("black bear", db = "itis")
`black bear`
1] "Chiropotes satanas"          "Ursus americanus luteolus"  
3] "Ursus americanus"            "Ursus americanus"           
5] "Ursus americanus americanus" "Ursus thibetanus"           
7] "Ursus thibetanus"
Lowest common rank among taxa
<- c("Sus scrofa", "Homo sapiens", "Nycticebus coucang")
st_common(spp, db = "ncbi")
           name        rank      id
1 Boreoeutheria below-class 1437010
Coerce codes to taxonomic id classes

numeric to uid

id(315567)
1] "315567"
ttr(,"class")
1] "uid"
ttr(,"match")
1] "found"
ttr(,"multiple_matches")
1] FALSE
ttr(,"pattern_match")
1] FALSE
ttr(,"uri")
1] "https://www.ncbi.nlm.nih.gov/taxonomy/315567"

list to uid

id(list("315567", "3339", "9696"))
1] "315567" "3339"   "9696"  
ttr(,"class")
1] "uid"
ttr(,"match")
1] "found" "found" "found"
ttr(,"multiple_matches")
1] FALSE FALSE FALSE
ttr(,"pattern_match")
1] FALSE FALSE FALSE
ttr(,"uri")
1] "https://www.ncbi.nlm.nih.gov/taxonomy/315567"
2] "https://www.ncbi.nlm.nih.gov/taxonomy/3339"  
3] "https://www.ncbi.nlm.nih.gov/taxonomy/9696"
Coerce taxonomic id classes to a data.frame
<- as.uid(c(315567, 3339, 9696))
 <- data.frame(out))
    ids class match multiple_matches pattern_match
 315567   uid found            FALSE         FALSE
   3339   uid found            FALSE         FALSE
   9696   uid found            FALSE         FALSE
                                          uri
 https://www.ncbi.nlm.nih.gov/taxonomy/315567
   https://www.ncbi.nlm.nih.gov/taxonomy/3339
   https://www.ncbi.nlm.nih.gov/taxonomy/9696
Contributing

See our CONTRIBUTING document.

Contributors

Alphebetical

Code Contributors
All Contributors! (via GitHub Issues)

Alphebetical

ahhurlbert - Alectoria - andzandz11 - antagomir - arendsee - ashenkin - ashiklom - bomeara - bw4sz - cboettig - cdeterman - ChrKoenig - chuckrp - clarson2191 - claudenozeres - cmzambranat - daattali - DanielGMead - davharris - davidvilanova - diogoprov - dlebauer - dlenz1 - dschlaep - EDiLD - emhart - fdschneider - fgabriel1891 - fmichonneau - gedankenstuecke - GISKid - glaroc - gustavobio - ibartomeus - jangorecki - jarioksa - jebyrnes - johnbaums - jonmcalder - JoStaerk - jsgosnell - kamapu - karthik - KevCaz - kgturner - kmeverson - Koalha - ljvillanueva - Markus2015 - mcsiple - MikkoVihtakari - millerjef - miriamgrace - mpnelsen - MUSEZOOLVERT - nate-d-olson - nmatzke - npch - philippi - pmarchand1 - RodgerG - rossmounce - sariya - scelmendorf - sckott - SimonGoring - snsheth - snubian - tdjames1 - tmkurobe - tpaulson1 - tpoisot - vijaybarve - wcornwell - wpetry - zachary-foster

Road map

Check out our milestones to see what we plan to get done for each version.

Meta

rofooter


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.