cloudant/couchbackup

Name: couchbackup

Owner: Cloudant

Description: Cloudant backup and restore library and command-line utility

Created: 2017-04-11 08:55:36.0

Updated: 2018-05-22 11:17:42.0

Pushed: 2018-05-25 09:05:12.0

Homepage: null

Size: 898

Language: JavaScript

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

CouchBackup

npm (scoped) npm (scoped with tag) Build Status Greenkeeper badge

__                  _    ______            _
_ \                | |   | ___ \          | |
 \/ ___  _   _  ___| |__ | |_/ / __ _  ___| | ___   _ _ __
   / _ \| | | |/ __| '_ \| ___ \/ _` |/ __| |/ / | | | '_ \
_/\ (_) | |_| | (__| | | | |_/ / (_| | (__|   <| |_| | |_) |
__/\___/ \__,_|\___|_| |_\____/ \__,_|\___|_|\_\\__,_| .__/
                                                     | |
                                                     |_|

CouchBackup is a command-line utility that allows a Cloudant or CouchDB database to be backed up to a text file. It comes with a companion command-line utility that can restore the backed up data.

N.B.

Installation

To install the latest released version use npm:

install -g @cloudant/couchbackup
Requirements
Snapshots

The latest builds of master are published to npm with the snapshot tag. Use the snapshot tag if you want to experiment with an unreleased fix or new function, but please note that snapshot versions are unsupported.

Usage

Either environment variables or command-line options can be used to specify the URL of the CouchDB or Cloudant instance, and the database to work with.

The URL

To define the URL of the CouchDB instance set the COUCH_URL environment variable:

rt COUCH_URL=http://localhost:5984

or

rt COUCH_URL=https://myusername:mypassword@myhost.cloudant.com

Alternatively we can use the --url command-line parameter.

The Database name

To define the name of the database to backup or restore, set the COUCH_DATABASE environment variable:

rt COUCH_DATABASE=animaldb

Alternatively we can use the --db command-line parameter

Backup

To backup a database to a text file, use the couchbackup command, directing the output to a text file:

hbackup > backup.txt

Another way of backing up is to set the COUCH_URL environment variable only and supply the database name on the command-line:

hbackup --db animaldb > animaldb.txt
Logging & resuming backups

You may also create a log file which records the progress of the backup with the --log parameter e.g.

hbackup --db animaldb --log animaldb.log > animaldb.txt

This log file can be used to resume backups from where you left off with --resume true:

hbackup --db animaldb --log animaldb.log --resume true >> animaldb.txt

The --resume true option works for a backup that has finished spooling changes, but has not yet completed downloading all the necessary batches of documents. It does not provide an incremental backup solution.

You may also specify the name of the output file, rather than directing the backup data to stdout:

hbackup --db animaldb --log animaldb.log --resume true --output animaldb.txt
Restore

Now we have our backup text file, we can restore it to an existing database using the couchrestore:

animaldb.txt | couchrestore

or specifying the database name on the command-line:

animaldb.txt | couchrestore --db animaldb2
Compressed backups

If we want to compress the backup data before storing to disk, we can pipe the contents through gzip:

hbackup --db animaldb | gzip > animaldb.txt.gz

and restore the file with:

animaldb.tar.gz | gunzip | couchdbrestore --db animaldb2
Encrypted backups

Similarly to compression it is possible to pipe the backup content through an encryption or decryption utility. For example with openssl:

hbackup --db animaldb | openssl aes-128-cbc -pass pass:12345 > encrypted_animal.db
h
ssl aes-128-cbc -d -in encrypted_animal.db -pass pass:12345 | couchrestore --db animaldb2

Note that the content is unencrypted while it is being processed by the backup tool before it is piped to the encryption utility.

What's in a backup file?

A backup file is a text file where each line contains a JSON encoded array of up to buffer-size objects e.g.

[{"a":1},{"a":2}...]
[{"a":501},{"a":502}...]
What's in a log file?

A log file contains a line:

What is shallow mode?

When you run couchbackup with --mode shallow a simpler backup is performed, only backing up the winning revisions of the database. No revision tokens are saved and any conflicting revisions are ignored. This is a faster, but less complete backup. Shallow backups cannot be resumed because they do not produce a log file.

Why use CouchBackup?

The easiest way to backup a CouchDB database is to copy the “.couch” file. This is fine on a single-node instance, but when running multi-node Cloudant or using CouchDB 2.0 or greater, the “.couch” file only contains a single shard of data. This utility allows simple backups of CouchDB or Cloudant database using the HTTP API.

This tool can be used to script the backup of your databases. Move the backup and log files to cheap Object Storage so that you have multiple copies of your precious data.

Options reference
Environment variables
Command-line paramters
Using programmatically

You can use couchbackup programatically. First install couchbackup into your project with npm install --save @cloudant/couchbackup. Then you can import the library into your code:

nst couchbackup = require('@cloudant/couchbackup');

The library exports two main functions:

  1. backup - backup from a database to a writable stream.
  2. restore - restore from a readable stream to a database.
Examples

See the examples folder for example scripts showing how to use the library.

Backup

The backup function takes a source database URL, a stream to write to, backup options and a callback for completion.

up: function(srcUrl, targetStream, opts, callback) { /* ... */ }

The opts dictionary can contain values which map to a subset of the environment variables defined above. Those related to the source and target locations are not required.

The callback has the standard err, data parameters and is called when the backup completes or fails.

The backup function returns an event emitter. You can subscribe to:

Backup data to a stream:

hbackup.backup(
ttps://examples.cloudant.com/animaldb',
ocess.stdout,
arallelism: 2},
nction(err, data) {
if (err) {
  console.error("Failed! " + err);
} else {
  console.error("Success! " + data);
}
;

Or to a file:

hbackup.backup(
ttps://examples.cloudant.com/animaldb',
.createWriteStream(filename),
arallelism: 2},
nction(err, data) {
if (err) {
  console.error("Failed! " + err);
} else {
  console.error("Success! " + data);
}
;
Restore

The restore function takes a readable stream containing the data emitted by the backup function. It uploads that to a Cloudant database, which should be a new database.

ore: function(srcStream, targetUrl, opts, callback) { /* ... */ }

The opts dictionary can contain values which map to a subset of the environment variables defined above. Those related to the source and target locations are not required.

The callback has the standard err, data parameters and is called when the restore completes or fails.

The restore function returns an event emitter. You can subscribe to:

The backup file (or srcStream) contains lists comprising of document revisions, where each list is separated by a newline. The list length is dictated by the bufferSize parameter used during the backup.

It's possible a list could be corrupt due to failures in the backup process. A BackupFileJsonError is emitted for each corrupt list found. These can only be ignored if the backup that generated the stream did complete successfully. This ensures that corrupt lists also have a valid counterpart within the stream.

Restore data from a stream:

hbackup.restore(
ocess.stdin,
ttps://examples.cloudant.com/new-animaldb',
arallelism: 2},
nction(err, data) {
if (err) {
  console.error("Failed! " + err);
} else {
  console.error("Success! " + data);
}
;

Or from a file:

hbackup.restore(
.createReadStream(filename),
ttps://examples.cloudant.com/new-animaldb',
arallelism: 2},
nction(err, data) {
if (err) {
  console.error("Failed! " + err);
} else {
  console.error("Success! " + data);
}
;
Error Handling

The couchbackup and couchrestore processes are designed to be relatively robust over an unreliable network. Work is batched and any failed requests are retried indefinitely. However, certain aspects of the execution will not tolerate failure:

API

When using the library programmatically an Error will be passed in one of two ways:

CLI Exit Codes

On fatal errors, couchbackup and couchrestore will exit with non-zero exit codes. This section details them.

common to both couchbackup and couchrestore
couchbackup
couchrestore
Note on attachments

TLDR; If you backup a database that contains attachments you will not be able to restore it.

As documented above couchbackup does not support backing up or restoring databases containing documents with attachments. Attempting to backup a database that includes documents with attachments will appear to succeed. However, the attachment content will not have been downloaded and the backup file will contain attachment metadata. Consequently any attempt to restore the backup will result in errors because the attachment metadata will reference attachments that are not present in the restored database.

It is recommended to store attachments directly in an object store with a link in the JSON document instead of using the native attachment API.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.