Name: aws.s3
Owner: the cloudyr project
Description: Amazon Simple Storage Service (S3) API Client
Created: 2014-12-18 18:23:33.0
Updated: 2018-01-17 15:42:20.0
Pushed: 2018-01-15 16:39:54.0
Homepage: https://cloud.r-project.org/package=aws.s3
Size: 352
Language: R
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
aws.s3 is a simple client package for the Amazon Web Services (AWS) Simple Storage Service (S3) REST API. While other packages currently connect R to S3, they do so incompletely (mapping only some of the API endpoints to R) and most implementations rely on the AWS command-line tools, which users may not have installed on their system.
To use the package, you will need an AWS account and to enter your credentials into R. Your keypair can be generated on the IAM Management Console under the heading Access Keys. Note that you only have access to your secret key once. After it is generated, you need to save it in a secure location. New keypairs can be generated at any time if yours has been lost, stolen, or forgotten. The aws.iam package profiles tools for working with IAM, including creating roles, users, groups, and credentials programmatically; it is not needed to use IAM credentials.
By default, all cloudyr packages for AWS services allow the use of credentials specified in a number of ways, beginning with:
User-supplied values passed directly to functions.
Environment variables, which can alternatively be set on the command line prior to starting R or via an Renviron.site
or .Renviron
file, which are used to set environment variables in R during startup (see ? Startup
). Or they can be set within R:
setenv("AWS_ACCESS_KEY_ID" = "mykey",
"AWS_SECRET_ACCESS_KEY" = "mysecretkey",
"AWS_DEFAULT_REGION" = "us-east-1",
"AWS_SESSION_TOKEN" = "mytoken")
If R is running an EC2 instance, the role profile credentials provided by aws.ec2metadata.
Profiles saved in a /.aws/credentials
“dot file” in the current working directory. The `“default” profile is assumed if none is specified.
A centralized ~/.aws/credentials
file, containing credentials for multiple accounts. The `“default” profile is assumed if none is specified.
Profiles stored locally or in a centralized location (e.g., ~/.aws/credentials
) can also be invoked via:
e your 'default' account credentials
signature::use_credentials()
e an alternative credentials profile
signature::use_credentials(profile = "bob")
Temporary session tokens are stored in environment variable AWS_SESSION_TOKEN
(and will be stored there by the use_credentials()
function). The aws.iam package provides an R interface to IAM roles and the generation of temporary session tokens via the security token service (STS).
The package can be used to examine publicly accessible S3 buckets and publicly accessible S3 objects without registering an AWS account. If credentials have been generated in the AWS console and made available in R, you can find your available buckets using:
ary("aws.s3")
etlist()
If your credentials are incorrect, this function will return an error. Otherwise, it will return a list of information about the buckets you have access to.
To get a listing of all objects in a public bucket, simply call
bucket(bucket = '1000genomes')
Amazon maintains a listing of Public Data Sets on S3.
To get a listing for all objects in a private bucket, pass your AWS key and secret in as parameters. (As described above, all functions in aws.s3 will look for your keys as environment variables by default, greatly simplifying the process of making a s3 request.)
ecify keys in-line
bucket(
cket = 'my_bucket',
y = YOUR_AWS_ACCESS_KEY,
cret = YOUR_AWS_SECRET_ACCESS_KEY
ecify keys as environment variables
setenv("AWS_ACCESS_KEY_ID" = "mykey",
"AWS_SECRET_ACCESS_KEY" = "mysecretkey")
bucket("my_bucket")
S3 can be a bit picky about region specifications. bucketlist()
will return buckets from all regions, but all other functions require specifying a region. A default of "us-east-1"
is relied upon if none is specified explicitly and the correct region can't be detected automatically. (Note: using an incorrect region is one of the most common - and hardest to figure out - errors when working with S3.)
There are eight main functions that will be useful for working with objects in S3:
s3read_using()
provides a generic interface for reading from S3 objects using a user-defined functions3write_using()
provides a generic interface for writing to S3 objects using a user-defined functionget_object()
returns a raw vector representation of an S3 object. This might then be parsed in a number of ways, such as rawToChar()
, xml2::read_xml()
, jsonlite::fromJSON()
, and so forth depending on the file format of the objectsave_object()
saves an S3 object to a specified local fileput_object()
stores a local file into an S3 buckets3save()
saves one or more in-memory R objects to an .Rdata file in S3 (analogously to save()
). s3saveRDS()
is an analogue for saveRDS()
s3load()
loads one or more objects into memory from an .Rdata file stored in S3 (analogously to load()
). s3readRDS()
is an analogue for saveRDS()
s3source()
sources an R script directly from S3They behave as you would probably expect:
ve an in-memory R object into S3
ve(mtcars, bucket = "my_bucket", object = "mtcars.Rdata")
oad()` R objects from the file
ad("mtcars.Rdata", bucket = "my_bucket")
t file as raw vector
object("mtcars.Rdata", bucket = "my_bucket")
ternative 'S3 URI' syntax:
object("s3://my_bucket/mtcars.Rdata")
ve file locally
_object("mtcars.Rdata", file = "mtcars.Rdata", bucket = "my_bucket")
t local file into S3
object(file = "mtcars.Rdata", object = "mtcars2.Rdata", bucket = "my_bucket")
This package is not yet on CRAN. To install the latest development version you can install from the cloudyr drat repository:
test stable version
all.packages("aws.s3", repos = c("cloudyr" = "http://cloudyr.github.io/drat"))
windows you may need:
all.packages("aws.s3", repos = c("cloudyr" = "http://cloudyr.github.io/drat"), INSTALL_opts = "--no-multiarch")
Or, to pull a potentially unstable version directly from GitHub:
!require("ghit")) {
install.packages("ghit")
::install_github("cloudyr/aws.s3")