NCSU-Libraries/APTrust-Bagit

Name: APTrust-Bagit

Owner: NCSU Libraries

Description: null

Created: 2016-04-11 14:35:11.0

Updated: 2017-05-05 15:34:23.0

Pushed: 2017-05-05 15:34:22.0

Homepage: null

Size: 15

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

aptrust-bagit

Description

A set of scripts to bag things according to the bagit specification, along with creation of metadata files that APTrust require. Ingests to APTrust S3 buckets, verifies S3 upload, records audit data to text file and to DAEV (digital asset management system).

Setup
Configuration

Copy config.yml.example to config.yml, and fill in with your own values

The Python S3 client (boto3) also expects that a file with the bucket keys reside in ~/.aws/credentials in the form:

[default]
aws_access_key_id = access_key_here 
aws_secret_access_key = secret_access_key_here
Installation
cd path/to/script
virtualenv venv
source venv/bin/activate
pip install -r requirements.txt
Running the script

This script is meant to be run inside a virtualenv (unless you install the dependencies globally), so you must activate the virtualenv before running it. You can either do this by activating the virtualenv the same way as in the installation (source venv/bin/activate), or call the script using the virtualenvs version of python:

/path/to/virtualenv/bin/python aptrust-bagit.py [args here]

Here is information about the various arguments you can provide to the script:

usage: aptrust-bagit.py [-h] [-b BAG] [-a ACCESS] [-p] [-v] directory

Bag a directory and send it to an APTrust S3 receiving bucket

positional arguments:
  directory             The directory to bag/ingest

optional arguments:
  -h, --help            show this help message and exit
  -b BAG, --bag BAG     Name to give the bag (default is the directory name)
  -a ACCESS, --access ACCESS
            APTrust access level for bag (can be either:
            consortia, institution, or restricted - default is
            institution)
  -p, --production      Ingest to production instance
  -v, --verbose         Provide more output
Run with nohup

Since larger bags may take a while to transmit, it is recommended to use nohup to run this script so you can disconnect from your SSH session (if running manually). See send_dir_to_aptrust.py for an example of how to do this, or run that script instead of aptrust-bagit.py (it defaults to submitting to the test instance of APTrust right now)

What information is recorded

For each bag, as part of the bagging process, the following information is kept:

In addition to this, the following information is recorded for successfully transmitted bags in audit.txt:

Errors in transmission or bagging are also recorded in logs/error.log

TODO

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.