X-Git-Url: http://git.vrable.net/?p=cumulus.git;a=blobdiff_plain;f=README;h=2dec5389873c5565b88f01a90010efc3167e5c8d;hp=eaf33fd1f37da141fa03476e681de253e4ae8424;hb=HEAD;hpb=5260c952146895e6fd522777a41c48db84535964 diff --git a/README b/README index eaf33fd..2dec538 100644 --- a/README +++ b/README @@ -1,14 +1,21 @@ - LBS: An LFS-Inspired Backup Solution + Cumulus: Efficient Filesystem Backup to the Cloud How to Build ------------ Dependencies: - - libuuid + - libuuid (sometimes part of e2fsprogs) - sqlite3 + - Python (2.7 or later, or 3.2 or later) + - Python six, a Python 2/3 compatibility library + https://pypi.python.org/pypi/six + - boto, the python interface to Amazon's Web Services (for S3 storage) + http://code.google.com/p/boto + - paramiko, SSH2 protocol for python (for sftp storage) + http://www.lag.net/paramiko/ Building should be a simple matter of running "make". This will produce -an executable called "lbs". +an executable called "cumulus". Setting up Backups @@ -16,28 +23,28 @@ Setting up Backups Two directories are needed for backups: one for storing the backup snapshots themselves, and one for storing bookkeeping information to go -with the backups. In this example, the first will be "/lbs", and the -second "/lbs.db", but any directories will do. Only the first -directory, /lbs, needs to be stored somewhere safe. The second is only -used when creating new snapshots, and is not needed when restoring. +with the backups. In this example, the first will be "/cumulus", and +the second "/cumulus.db", but any directories will do. Only the first +directory, /cumulus, needs to be stored somewhere safe. The second is +only used when creating new snapshots, and is not needed when restoring. 1. Create the snapshot directory and the local database directory: - $ mkdir /lbs /lbs.db + $ mkdir /cumulus /cumulus.db 2. Initialize the local database using the provided script schema.sql from the source: - $ sqlite3 /lbs.db/localdb.sqlite + $ sqlite3 /cumulus.db/localdb.sqlite sqlite> .read schema.sql sqlite> .exit 3. If encrypting or signing backups with gpg, generate appropriate keypairs. The keys can be kept in a user keyring or in a separate keyring just for backups; this example does the latter. - $ mkdir /lbs.db/gpg; chmod 700 /lbs.db/gpg - $ gpg --homedir /lbs.db/gpg --gen-key + $ mkdir /cumulus.db/gpg; chmod 700 /cumulus.db/gpg + $ gpg --homedir /cumulus.db/gpg --gen-key (generate a keypair for encryption; enter a passphrase for the secret key) - $ gpg --homedir /lbs.db/gpg --gen-key + $ gpg --homedir /cumulus.db/gpg --gen-key (generate a second keypair for signing; for automatic signing do not use a passphrase to protect the secret key) Be sure to store the secret key needed for decryption somewhere @@ -47,10 +54,10 @@ used when creating new snapshots, and is not needed when restoring. isn't necessary to create future backups that are signed with the same key). - To achieve better compression, the ecnryption key can be edited to + To achieve better compression, the encryption key can be edited to alter the preferred compression algorithms to list bzip2 before zlib. Run - $ gpg --homedir /lbs.db/gpg --edit-key + $ gpg --homedir /cumulus.db/gpg --edit-key Command> pref (prints a terse listing of preferences associated with the key) @@ -61,26 +68,27 @@ used when creating new snapshots, and is not needed when restoring. to be "Z3 Z2 Z1" which stands for "BZIP2, ZLIB, ZIP") Command> save - Copy the provided encryption filter program, lbs-filter-gpg, + Copy the provided encryption filter program, cumulus-filter-gpg, somewhere it may be run from. - 4. Create a script for launching the LBS backup process. A simple + 4. Create a script for launching the Cumulus backup process. A simple version is: #!/bin/sh - export LBS_GPG_HOME=/lbs.db/gpg + export LBS_GPG_HOME=/cumulus.db/gpg export LBS_GPG_ENC_KEY= export LBS_GPG_SIGN_KEY= - lbs --dest=/lbs --localdb=/lbs.db - --filter="lbs-filter-gpg --encrypt" --filter-extension=.gpg \ - --signature-filter="lbs-filter-gpg --clearsign" \ + cumulus --dest=/cumulus --localdb=/cumulus.db --scheme=test \ + --filter="cumulus-filter-gpg --encrypt" --filter-extension=.gpg \ + --signature-filter="cumulus-filter-gpg --clearsign" \ /etc /home /other/paths/to/store Make appropriate substitutions for the key IDs and any relevant - paths. If desired, insert an option "--scheme=" to specify a - name for this backup scheme which will be included in the snapshot - file names (for example, use a name based on the hostname or - descriptive of the files backed up). + paths. Here "--scheme=test" gives a descriptive name ("test") to + this collection of snapshots. It is possible to store multiple sets + of backups in the same directory, using different scheme names to + distinguish them. The --scheme option can also be left out + entirely. Backup Maintenance @@ -91,19 +99,25 @@ that are mostly unused, but are storing a small amount of useful data. Data in these segments will be rewritten into new segments in future backups to eliminate the dependence on the almost-empty old segments. -Segment cleaning is currently a mostly manual process. An automatic -tool for performing segment cleaning will be available in the future. - -Old backup snapshots can be pruned from the snapshot directory (/lbs) to -recover space. Deleting an old backup snapshot is a simlpe matter of -deleting the appropriate snapshot descriptor file (snapshot-*.lbs) and -any associated checksums (snapshot-*.sha1sums). Segments used by that -snapshot, but not any other snapshots, can be identified by running the -clean-segments.pl script from the /lbs directory--this will perform a -scan of the current directory to identify unreferenced segments, and -will print a list to stdout. Assuming the list looks reasonable, the -segments can be quickly deleted with +The provided cumulus-util tool can perform the necessary cleaning. Run +it with + $ cumulus-util --localdb=/cumulus.db clean +Cleaning is still under development, and so may be improved in the +future, but this version is intended to be functional. + +Old backup snapshots can be pruned from the snapshot directory +(/cumulus) to recover space. A snapshot which is still referenced by +the local database should not be deleted, however. Deleting an old +backup snapshot is a simple matter of deleting the appropriate snapshot +descriptor file (snapshot-*.lbs) and any associated checksums +(snapshot-*.sha1sums). Segments used by that snapshot, but not any +other snapshots, can be identified by running the clean-segments.pl +script from the /cumulus directory--this will perform a scan of the +current directory to identify unreferenced segments, and will print a +list to stdout. Assuming the list looks reasonable, the segments can be +quickly deleted with $ rm `./clean-segments.pl` +A tool to make this easier will be implemented later. The clean-segments.pl script will also print out a warning message if any snapshots appear to depend upon segments which are not present; this @@ -111,12 +125,61 @@ is a serious error which indicates that some of the data needed to recover a snapshot appears to be lost. -Restoring a Snapshot --------------------- - -The restore.pl script is a simple (proof-of-concept, really) program for -restoring the contents of an LBS snapshot. Ideally, it should be stored -with the backup files so it is available if it is needed. +Listing and Restoring Snapshots +------------------------------- + +A listing of all currently-stored snapshots (and their sizes) can be +produced with + $ cumulus-util --store=/cumulus list-snapshot-sizes + +If data from a snapshot needs to be restored, this can be done with + $ cumulus-util --store=/cumulus restore-snapshot \ + test-20080101T121500 /dest/dir +Here, "test-20080101T121500" is the name of the snapshot (consisting of +the scheme name and a timestamp; this can be found from the output of +list-snapshot-sizes) and "/dest/dir" is the path under which files +should be restored (this directory should initially be empty). +"" is a list of files or directories to restore. If none are +specified, the entire snapshot is restored. + + +Remote Backups +-------------- + +The cumulus-util command can operate directly on remote backups. The +--store parameter accepts, in addition to a raw disk path, a URL. +Supported URL forms are + file:///path Equivalent to /path + s3://bucket/path Storage in Amazon S3 + (Expects the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY + environment variables to be set appropriately) + sftp://server/path Storage on sftp server + (note that no password authentication or password protected + authorization keys are not supported atm and config options + like port or individual authorization keys are to be + configured in ~/.ssh/config and the public key of the + server has to be in ~/.ssh/known_hosts) + +To copy backup snapshots from one storage area to another, the +cumulus-sync command can be used, as in + $ cumulus-sync file:///cumulus s3://my-bucket/cumulus + +Support for directly writing backups to a remote location (without using +a local staging directory and cumulus-sync) is slightly more +experimental, but can be achieved by replacing + --dest=/cumulus +with + --upload-script="cumulus-store s3://my-bucket/cumulus" + + +Alternate Restore Tool +---------------------- + +The contrib/restore.pl script is a simple program for restoring the +contents of a Cumulus snapshot. It is not as full-featured as the +restore functionality in cumulus-util, but it is far more compact. It +could be stored with the backup files so a tool for restores is +available even if all other data is lost. The restore.pl script does not know how to decompress segments, so this step must be performed manually. Create a temporary directory for @@ -132,5 +195,3 @@ filled with one directory for each segment decompressed. Run restore.pl giving two arguments: the snapshot descriptor file (*.lbs) in the temporary directory, and a directory where the restored files should be written. - -A better recovery tool will be provided in the future.