Name: lf-backup
Owner: Fred Hutchinson Cancer Research Center
Description: lf-backup (aka large file backup) takes a list of filenames from a csv file or sql and copies the files to a swift objectstore
Created: 2016-10-25 23:59:44.0
Updated: 2016-10-26 00:59:00.0
Pushed: 2017-01-31 06:18:22.0
Size: 57
Language: Python
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
LF Backup stands for large file backup.
RHEL / CentOS:
yum -y install epel-release
yum -y install python34 python34-setuptools python34-devel gcc postgresql-devel
easy_install-3.4 pip
Debian/Ubuntu:
apt-get install -y python3-dev libpq-dev
and then install lf-backup
pip3 install --upgrade pip
pip3 install lf-backup
create a new swift container called “large-file-backup”
add the export statements for variables starting with ST_ and the postgres authentication to config file .lf-backuprc and set the permissions to 600. Optionally export PGSQL to override the built-in SQL query.
no ~/.lf-backuprc
mod 600 ~/.lf-backuprc
t ~/.lf-backuprc
rt ST_AUTH=https://swiftcluster.domain.org/auth/v1.0
rt ST_USER=swift_account
rt ST_KEY=RshBXXXXXXXXXXXXXXXXXXXXX
rt PGHOST=pgdb.domain.org
rt PGPORT=32048
rt PGDATABASE=storcrawldb
rt PGUSER=xxxxxxxx
rt PGPASSWORD=
create a cron job /etc/cron.d/ running as root starting ca 7pm:
t /etc/cron.d/lf-backup
nabled on hostname xxx on 11-01-2016
8 * * * root /usr/local/bin/lf-backup --prefix /fh/fast \
--container large-file-backup-fast --sql >> /var/tmp/lf-backup-fast 2>&1
ackup -C frobozz -c filelist.csv
Read list of files from 1st column of 'filename.csv' and backup to Swift container 'frobozz' using environment for authentication.
ackup -C grue -s
Query the database specified in the environment for the files and backup to Swift container 'grue' using environment for authentication.
ackup -C flathead -r 7 --prefix /fh/fast/restore42
Restore all objects in Swift container 'flathead' newer than 7 days back to current environment. The optional –prefix parameter specifies a destination path where objects will be restored.
For modifications and change testing install a new system and install from local git folder
t clone https://github.com/FredHutch/lf-backup
-rf /usr/local/lib/python3.5/dist-packages/*; rm -rf /usr/local/bin/*
p3 install -e ./lf-backup
changes in lf-backup and run again:
-rf /usr/local/lib/python3.5/dist-packages/*; rm -rf /usr/local/bin/*
p3 install -e ./lf-backup
The script had the following original feature requests:
take a file list from CSV file or SQL DB and backup each file to object storage (e.g. swift)
restore files from object storage newer than specified number of days (>1)
if the file has an atime within the last x days (configurable) take an md5sum of that file and store the md5sum in an attribute / meta data called md5sum (not yet implemented)
check if the file is already in object store and do not upload if the file size and mtime is identical
notify a list of email-addresses after finishing. attach list of files that were uploaded; create one file list per file owner (username)
log every file that was uploaded to syslog, detailed logging of success and failure to enable storage team to monitor success / failure via splunk
bash script lf-backup is a wrapper for python script lfbackup.py, lf-backup sources and sets env vars with credentials and lfbackup.py only reads environments vars
main script lfbackup.py only uses swift functions in lflib.py.
segment size should be 1GB, segment container name should be .segments-containername, object type is slo, not dlo
backup with full path but replace prefix, for example a file /fh/fast/lastname_f/project/file.bam would be copied to container/bucket bam-backup in account Swift__ADM_IT_backup. The target path would be /bam-bucket/lastname_f/project/file.bam a –prefix=/fh/fast removes the fs root path from the destination