Name: dataverse-client-r
Owner: Institute for Quantitative Social Science
Description: R Client for Dataverse 4 Repositories
Created: 2015-08-03 12:57:55.0
Updated: 2018-01-05 14:18:09.0
Pushed: 2017-10-10 12:44:12.0
Homepage: https://cran.r-project.org/package=dataverse
Size: 236
Language: R
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
The dataverse package provides access to Dataverse 4 APIs, enabling data search, retrieval, and deposit, thus allowing R users to integrate public data sharing into the reproducible research workflow. dataverse is the next-generation iteration of the dvn package, which works with Dataverse 3 (“Dataverse Network”) applications. dataverse includes numerous improvements for data search, retrieval, and deposit, including use of the (currently in development) sword package for data deposit and the UNF package for data fingerprinting.
Some features of the Dataverse 4 API are public and require no authentication. This means in many cases you can search for and retrieve data without a Dataverse account for that a specific Dataverse installation. But, other features require a Dataverse account for the specific server installation of the Dataverse software, and an API key linked to that account. Instructions for obtaining an account and setting up an API key are available in the Dataverse User Guide. (Note: if your key is compromised, it can be regenerated to preserve security.) Once you have an API key, this should be stored as an environment variable called DATAVERSE_KEY
. It can be set within R using:
setenv("DATAVERSE_KEY" = "examplekey12345")
Because there are many Dataverse installations, all functions in the R client require specifying what server installation you are interacting with. This can be set by default with an environment variable, DATAVERSE_SERVER
. This should be the Dataverse server, without the “https” prefix or the “/api” URL path, etc. For example, the Harvard Dataverse can be used by setting:
setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu")
Note: The package attempts to compensate for any malformed values, though.
Currently, the package wraps the data management features of the Dataverse API. Functions for other API features - related to user management and permissions - are not currently exported in the package (but are drafted in the source code).
Dataverse supplies a pretty robust search API to discover Dataverses, datasets, and files. The simplest searches simply consist of a query string:
ary("dataverse")
dataverse_search("Gary King"), 1)
0 of 1043 results retrieved
data.frame': 10 obs. of 17 variables:
$ name : chr "00698McArthur-King-BoxCoverSheets.pdf" "00698McArthur-King-MemoOfAgreement.pdf" "00698McArthur-King-StudyDescription.pdf" "077_mod1_s2m.tab" ...
$ type : chr "file" "file" "file" "file" ...
$ url : chr "https://dataverse.harvard.edu/api/access/datafile/101348" "https://dataverse.harvard.edu/api/access/datafile/101349" "https://dataverse.harvard.edu/api/access/datafile/101350" "https://dataverse.harvard.edu/api/access/datafile/2910738" ...
$ file_id : chr "101348" "101349" "101350" "2910738" ...
$ description : chr "Describe contents of each box of a paper data set" "Legal agreement between data depositor and Murray Archive" "Overview: abstract, research methodology, publications, and other info." NA ...
$ published_at : chr "2009-03-05T00:00:00Z" "2009-03-05T00:00:00Z" "2009-03-05T00:00:00Z" "2016-11-09T22:06:10Z" ...
$ file_type : chr "Adobe PDF" "Adobe PDF" "Adobe PDF" "Tab-Delimited" ...
$ file_content_type: chr "application/pdf" "application/pdf" "application/pdf" "text/tab-separated-values" ...
$ size_in_bytes : int 503714 360107 16506 318276 NA NA NA NA NA NA
$ md5 : chr "" "" "" "af9a6fa00bf29009e9eb5d366ad64660" ...
$ checksum :'data.frame': 10 obs. of 2 variables:
$ dataset_citation : chr "Charles C. McArthur; Stanley H. King, 2009, \"Harvard Student Study, 1960-1964\", hdl:1902.1/00698, Harvard Dataverse, V2" "Charles C. McArthur; Stanley H. King, 2009, \"Harvard Student Study, 1960-1964\", hdl:1902.1/00698, Harvard Dataverse, V2" "Charles C. McArthur; Stanley H. King, 2009, \"Harvard Student Study, 1960-1964\", hdl:1902.1/00698, Harvard Dataverse, V2" "International Food Policy Research Institute (IFPRI); Savannah Agricultural Research Institute, 2016, \"Medium "| __truncated__ ...
$ unf : chr NA NA NA "UNF:6:4mZh78EEGxqFLF71f/Nh/A==" ...
$ global_id : chr NA NA NA NA ...
$ citationHtml : chr NA NA NA NA ...
$ citation : chr NA NA NA NA ...
$ authors :List of 10
More complicated searches might specify metadata fields:
dataverse_search(author = "Gary King", title = "Ecological Inference"), 1)
0 of 1349 results retrieved
data.frame': 10 obs. of 17 variables:
$ name : chr "00531Winter-LiberalArts-Clare-Data.tab" "00698McArthur-King-BoxCoverSheets.pdf" "00698McArthur-King-MemoOfAgreement.pdf" "00698McArthur-King-StudyDescription.pdf" ...
$ type : chr "file" "file" "file" "file" ...
$ url : chr "https://dataverse.harvard.edu/api/access/datafile/101725" "https://dataverse.harvard.edu/api/access/datafile/101348" "https://dataverse.harvard.edu/api/access/datafile/101349" "https://dataverse.harvard.edu/api/access/datafile/101350" ...
$ file_id : chr "101725" "101348" "101349" "101350" ...
$ description : chr "Clare College data in tab delimited format" "Describe contents of each box of a paper data set" "Legal agreement between data depositor and Murray Archive" "Overview: abstract, research methodology, publications, and other info." ...
$ published_at : chr "2010-05-10T00:00:00Z" "2009-03-05T00:00:00Z" "2009-03-05T00:00:00Z" "2009-03-05T00:00:00Z" ...
$ file_type : chr "Tab-Delimited" "Adobe PDF" "Adobe PDF" "Adobe PDF" ...
$ file_content_type: chr "text/tab-separated-values" "application/pdf" "application/pdf" "application/pdf" ...
$ size_in_bytes : int 167843 503714 360107 16506 318276 NA 3825612 4012 9054 48213
$ md5 : chr "" "" "" "" ...
$ checksum :'data.frame': 10 obs. of 2 variables:
$ unf : chr "UNF:3:9ZWOqiilVGnLacm4Qg2EYQ==" NA NA NA ...
$ dataset_citation : chr "David G. Winter; David C. McClelland; Abigail J. Stewart, 2010, \"New Case for the Liberal Arts, 1974-1978\", h"| __truncated__ "Charles C. McArthur; Stanley H. King, 2009, \"Harvard Student Study, 1960-1964\", hdl:1902.1/00698, Harvard Dataverse, V2" "Charles C. McArthur; Stanley H. King, 2009, \"Harvard Student Study, 1960-1964\", hdl:1902.1/00698, Harvard Dataverse, V2" "Charles C. McArthur; Stanley H. King, 2009, \"Harvard Student Study, 1960-1964\", hdl:1902.1/00698, Harvard Dataverse, V2" ...
$ global_id : chr NA NA NA NA ...
$ citationHtml : chr NA NA NA NA ...
$ citation : chr NA NA NA NA ...
$ authors :List of 10
And searches can be restricted to specific types of objects (Dataverse, dataset, or file):
dataverse_search(author = "Gary King", type = "dataset"), 1)
0 of 523 results retrieved
data.frame': 10 obs. of 9 variables:
$ name : chr "10 Million International Dyadic Events" "A Comparative Study between Gurukul System and Western System of Education" "A Lexicial Index of Electoral Democracy" "A Unified Model of Cabinet Dissolution in Parliamentary Democracies" ...
$ type : chr "dataset" "dataset" "dataset" "dataset" ...
$ url : chr "http://hdl.handle.net/1902.1/FYXLAWZRIA" "http://dx.doi.org/10.7910/DVN/329UAV" "http://dx.doi.org/10.7910/DVN/29106" "http://dx.doi.org/10.3886/ICPSR01115.v1" ...
$ global_id : chr "hdl:1902.1/FYXLAWZRIA" "doi:10.7910/DVN/329UAV" "doi:10.7910/DVN/29106" "doi:10.3886/ICPSR01115.v1" ...
$ description : chr "When the Palestinians launch a mortar attack into Israel, the Israeli army does not wait until the end of the c"| __truncated__ "India, in ancient times has witnessed students which used to be like the great king Vikramaditya. He followed t"| __truncated__ "We operationalize electoral democracy as a series of necessary-and-sufficient conditions arrayed in an ordinal "| __truncated__ "The literature on cabinet duration is split between two apparently irreconcilable positions. The ATTRIBUTES THE"| __truncated__ ...
$ published_at: chr "2014-08-21T00:00:00Z" "2016-06-07T13:09:20Z" "2016-08-05T20:42:31Z" "2015-04-09T04:13:54Z" ...
$ citationHtml: chr "King, Gary; Lowe, Will, 2008, \"10 Million International Dyadic Events\", <a href=\"http://hdl.handle.net/1902."| __truncated__ "Mr. Amrish George Frederick, 2016, \"A Comparative Study between Gurukul System and Western System of Education"| __truncated__ "Skaaning, Svend-Erik; John Gerring; Henrikas Bartusevicius, 2015, \"A Lexicial Index of Electoral Democracy\", "| __truncated__ "King, Gary; Alt, James E.; Burns, Nancy; Laver, Michael, 1996, \"A Unified Model of Cabinet Dissolution in Parl"| __truncated__ ...
$ citation : chr "King, Gary; Lowe, Will, 2008, \"10 Million International Dyadic Events\", hdl:1902.1/FYXLAWZRIA, Harvard Datave"| __truncated__ "Mr. Amrish George Frederick, 2016, \"A Comparative Study between Gurukul System and Western System of Education"| __truncated__ "Skaaning, Svend-Erik; John Gerring; Henrikas Bartusevicius, 2015, \"A Lexicial Index of Electoral Democracy\", "| __truncated__ "King, Gary; Alt, James E.; Burns, Nancy; Laver, Michael, 1996, \"A Unified Model of Cabinet Dissolution in Parl"| __truncated__ ...
$ authors :List of 10
The results are paginated using per_page
argument. To retrieve subsequent pages, specify start
.
The easiest way to access data from Dataverse is to use a persistent identifier (typically a DOI). You can retrieve the contents of a Dataverse dataset:
dataset("doi:10.7910/DVN/ARKOTI")
ataset (75170):
ersion: 1.0, RELEASED
elease Date: 2015-07-07T02:57:02Z
icense: CC0
7 Files:
label version id contentType
alpl2013.tab 2 2692294 text/tab-separated-values
BPchap7.tab 2 2692295 text/tab-separated-values
chapter01.R 2 2692202 text/plain; charset=US-ASCII
chapter02.R 2 2692206 text/plain; charset=US-ASCII
chapter03.R 2 2692210 text/plain; charset=US-ASCII
chapter04.R 2 2692204 text/plain; charset=US-ASCII
chapter05.R 2 2692205 text/plain; charset=US-ASCII
chapter06.R 2 2692212 text/plain; charset=US-ASCII
chapter07.R 2 2692209 text/plain; charset=US-ASCII
0 chapter08.R 2 2692208 text/plain; charset=US-ASCII
1 chapter09.R 2 2692211 text/plain; charset=US-ASCII
2 chapter10.R 1 2692203 text/plain; charset=US-ASCII
3 chapter11.R 1 2692207 text/plain; charset=US-ASCII
4 comprehensiveJapanEnergy.tab 2 2692296 text/tab-separated-values
5 constructionData.tab 2 2692293 text/tab-separated-values
6 drugCoverage.csv 1 2692233 text/plain; charset=US-ASCII
7 hanmerKalkanANES.tab 2 2692290 text/tab-separated-values
8 hmnrghts.tab 2 2692298 text/tab-separated-values
9 hmnrghts.txt 1 2692238 text/plain
0 levant.tab 2 2692289 text/tab-separated-values
1 LL.csv 1 2692228 text/plain; charset=US-ASCII
2 moneyDem.tab 2 2692292 text/tab-separated-values
3 owsiakJOP2013.tab 2 2692297 text/tab-separated-values
4 PESenergy.csv 1 2692230 text/plain; charset=US-ASCII
5 pts1994.csv 1 2692229 text/plain; charset=US-ASCII
6 pts1995.csv 1 2692231 text/plain; charset=US-ASCII
7 sen113kh.ord 1 2692239 text/plain; charset=US-ASCII
8 SinghEJPR.tab 2 2692299 text/tab-separated-values
9 SinghJTP.tab 2 2692288 text/tab-separated-values
0 stdSingh.tab 2 2692291 text/tab-separated-values
1 UN.csv 1 2692232 text/plain; charset=US-ASCII
2 war1800.tab 2 2692300 text/tab-separated-values
Knowing a file name, you can also access that file (e.g., a Stata dataset) directly in R:
get_file("constructionData.tab", "doi:10.7910/DVN/ARKOTI")
ad it into memory
<- tempfile(fileext = ".dta")
eBin(as.vector(f), tmp)
<- foreign::read.dta(tmp)
If you don't know the file name in advance, you can parse the available files returned by get_dataset()
and retrieve the file using its Dataverse “id” number.
Dataverse provides two - basically unrelated - workflows for managing (adding, documenting, and publishing) datasets. The first is built on SWORD v2.0. This means that to create a new dataset listing, you will have first initialize a dataset entry with some metadata, add one or more files to the dataset, and then publish it. This looks something like the following:
trieve your service document
service_document()
eate a list of metadata
dat <- list(title = "My Study",
creator = "Doe, John",
description = "An example study")
eate the dataset
- initiate_sword_dataset("mydataverse", body = metadat)
d files to dataset
<- tempfile()
e.csv(iris, file = tmp)
add_file(ds, file = tmp)
blish new dataset
ish_sword_dataset(ds)
taset will now be published
_datasets("mydataverse")
The second workflow is called the “native” API and is similar but uses slightly different functions:
eate the dataset
- create_dataset("mydataverse")
d files
<- tempfile()
e.csv(iris, file = tmp)
add_dataset_file(file = tmp, dataset = ds)
blish dataset
ish_dataset(ds)
taset will now be published
dataverse("mydataverse")
Through the native API it is possible to update a dataset by modifying its metadata with update_dataset()
or file contents using update_dataset_file()
and then republish a new version using publish_dataset()
.
You can (eventually) find a stable release on CRAN, or install the latest development version from GitHub:
!require("ghit")) {
install.packages("ghit")
::install_github("iqss/dataverse-client-r")
ary("dataverse")
Users interested in downloading metadata from archives other than Dataverse may be interested in Kurt Hornik's OAIHarvester and Scott Chamberlain's oai, which offer metadata download from any web repository that is compliant with the Open Archives Initiative standards. Additionally, rdryad uses OAIHarvester to interface with Dryad. The rfigshare package works in a similar spirit to dataverse with https://figshare.com/.