From: Michael Vrable Date: Mon, 9 Jun 2008 18:01:25 +0000 (-0700) Subject: Updates to documentation and contributed scripts for name change. X-Git-Url: https://git.vrable.net/?a=commitdiff_plain;h=9d3cca72ea3c0f912c7250d84e12357346e59fe2;p=cumulus.git Updates to documentation and contributed scripts for name change. --- diff --git a/NEWS b/NEWS index aee1b11..b23e27a 100644 --- a/NEWS +++ b/NEWS @@ -3,6 +3,8 @@ requires an extension to the local database. The upgrade script contrib/upgrade0.7-localdb.sql should be run prior to running backups with this version. + - Name change: the system is now known as Cumulus (replacing the old + name of "LBS"). Some traces of the old name still remain. - Initial support for direct backups to remote storage. A sample script is provided for backing up to Amazon S3. Other scripts should be simple to write. @@ -14,6 +16,12 @@ entire snapshot. Additionally, restore files in an order that should optimize performance (restore files based on how they are grouped into segments, instead of lexicographic order). + Currently, the implementation of these changes requires that all + metadata be loaded into memory when the restore tool runs, so it + is more memory-intensive than the old version. This may be fixed + in a future version; in the meantime, if the current restore tool + requires too much memory, try the old restore tool or the + restore.pl script. 0.6 [2008-02-19] - SNAPSHOT FORMAT CHANGE: A few minor tweaks have been made to the diff --git a/README b/README index 20472d4..8a1be3c 100644 --- a/README +++ b/README @@ -1,4 +1,4 @@ - LBS: An LFS-Inspired Backup Solution + Cumulus: Efficient Filesystem Backup to the Cloud How to Build ------------ @@ -8,7 +8,7 @@ Dependencies: - sqlite3 Building should be a simple matter of running "make". This will produce -an executable called "lbs". +an executable called "cumulus". Setting up Backups @@ -16,28 +16,28 @@ Setting up Backups Two directories are needed for backups: one for storing the backup snapshots themselves, and one for storing bookkeeping information to go -with the backups. In this example, the first will be "/lbs", and the -second "/lbs.db", but any directories will do. Only the first -directory, /lbs, needs to be stored somewhere safe. The second is only -used when creating new snapshots, and is not needed when restoring. +with the backups. In this example, the first will be "/cumulus", and +the second "/cumulus.db", but any directories will do. Only the first +directory, /cumulus, needs to be stored somewhere safe. The second is +only used when creating new snapshots, and is not needed when restoring. 1. Create the snapshot directory and the local database directory: - $ mkdir /lbs /lbs.db + $ mkdir /cumulus /cumulus.db 2. Initialize the local database using the provided script schema.sql from the source: - $ sqlite3 /lbs.db/localdb.sqlite + $ sqlite3 /cumulus.db/localdb.sqlite sqlite> .read schema.sql sqlite> .exit 3. If encrypting or signing backups with gpg, generate appropriate keypairs. The keys can be kept in a user keyring or in a separate keyring just for backups; this example does the latter. - $ mkdir /lbs.db/gpg; chmod 700 /lbs.db/gpg - $ gpg --homedir /lbs.db/gpg --gen-key + $ mkdir /cumulus.db/gpg; chmod 700 /cumulus.db/gpg + $ gpg --homedir /cumulus.db/gpg --gen-key (generate a keypair for encryption; enter a passphrase for the secret key) - $ gpg --homedir /lbs.db/gpg --gen-key + $ gpg --homedir /cumulus.db/gpg --gen-key (generate a second keypair for signing; for automatic signing do not use a passphrase to protect the secret key) Be sure to store the secret key needed for decryption somewhere @@ -50,7 +50,7 @@ used when creating new snapshots, and is not needed when restoring. To achieve better compression, the encryption key can be edited to alter the preferred compression algorithms to list bzip2 before zlib. Run - $ gpg --homedir /lbs.db/gpg --edit-key + $ gpg --homedir /cumulus.db/gpg --edit-key Command> pref (prints a terse listing of preferences associated with the key) @@ -61,19 +61,19 @@ used when creating new snapshots, and is not needed when restoring. to be "Z3 Z2 Z1" which stands for "BZIP2, ZLIB, ZIP") Command> save - Copy the provided encryption filter program, lbs-filter-gpg, + Copy the provided encryption filter program, cumulus-filter-gpg, somewhere it may be run from. 4. Create a script for launching the LBS backup process. A simple version is: #!/bin/sh - export LBS_GPG_HOME=/lbs.db/gpg + export LBS_GPG_HOME=/cumulus.db/gpg export LBS_GPG_ENC_KEY= export LBS_GPG_SIGN_KEY= - lbs --dest=/lbs --localdb=/lbs.db - --filter="lbs-filter-gpg --encrypt" --filter-extension=.gpg \ - --signature-filter="lbs-filter-gpg --clearsign" \ + cumulus --dest=/cumulus --localdb=/cumulus.db + --filter="cumulus-filter-gpg --encrypt" --filter-extension=.gpg \ + --signature-filter="cumulus-filter-gpg --clearsign" \ /etc /home /other/paths/to/store Make appropriate substitutions for the key IDs and any relevant @@ -91,23 +91,23 @@ that are mostly unused, but are storing a small amount of useful data. Data in these segments will be rewritten into new segments in future backups to eliminate the dependence on the almost-empty old segments. -The provided lbs-util tool can perform the necessary cleaning. Run it -with - $ lbs-util --localdb=/lbs.db clean +The provided cumulus-util tool can perform the necessary cleaning. Run +it with + $ cumulus-util --localdb=/cumulus.db clean Cleaning is still under development, and so may be improved in the future, but this version is intended to be functional. -Old backup snapshots can be pruned from the snapshot directory (/lbs) to -recover space. A snapshot which is still referenced by the local -database should not be deleted, however. Deleting an old backup -snapshot is a simple matter of deleting the appropriate snapshot +Old backup snapshots can be pruned from the snapshot directory +(/cumulus) to recover space. A snapshot which is still referenced by +the local database should not be deleted, however. Deleting an old +backup snapshot is a simple matter of deleting the appropriate snapshot descriptor file (snapshot-*.lbs) and any associated checksums (snapshot-*.sha1sums). Segments used by that snapshot, but not any other snapshots, can be identified by running the clean-segments.pl -script from the /lbs directory--this will perform a scan of the current -directory to identify unreferenced segments, and will print a list to -stdout. Assuming the list looks reasonable, the segments can be quickly -deleted with +script from the /cumulus directory--this will perform a scan of the +current directory to identify unreferenced segments, and will print a +list to stdout. Assuming the list looks reasonable, the segments can be +quickly deleted with $ rm `./clean-segments.pl` A tool to make this easier will be implemented later. @@ -140,5 +140,5 @@ Run restore.pl giving two arguments: the snapshot descriptor file (*.lbs) in the temporary directory, and a directory where the restored files should be written. -The lbs-util program also now has some preliminary support for restoring +The cumulus-util program also now has some support for restoring snapsots (documentation coming soon). diff --git a/TODO b/TODO index 8bd8e1d..f2261b9 100644 --- a/TODO +++ b/TODO @@ -4,8 +4,8 @@ statcache file. * Implement a scheme for cleaning metadata log segments when metadata is re-used. -* Allow direct backup to a remote network server, probably implemented -using an external helper script which is called to transfer a file. +* Improve the interface for remote upload scripts. The current +interface is preliminary and is subject to change. * Continue to investigate schemes for adding parity blocks for error recovery. diff --git a/contrib/cumulus-filter-gpg b/contrib/cumulus-filter-gpg new file mode 100755 index 0000000..010c05f --- /dev/null +++ b/contrib/cumulus-filter-gpg @@ -0,0 +1,59 @@ +#!/bin/bash +# +# Filter for encrypting/decrypting/signing LBS archives using gpg. +# +# This takes input on stdin and produces output to stdout. It can operate in +# one of several modes, depending upon the command-line argument supplied: +# --encrypt Encrypt the data stream +# --decrypt Decrypt the supplied data +# --clearsign Enclose a text file with a signature +# Options are controlled by various environment variables: +# LBS_GPG_HOME set the gpg home directory (containing keyrings) +# LBS_GPG_ENC_KEY key ID to use encryption +# LBS_GPG_SIGN_KEY key ID to use for signing +# LBS_GPG_PASSPHRASE passphrase to supply to gpg, if needed + +declare -a gpg_options +gpg_options=(--quiet --batch) + +if [ -n "$LBS_GPG_HOME" ]; then + gpg_options=("${gpg_options[@]}" --homedir "$LBS_GPG_HOME") +fi + +# Run gpg with the options in $gpg_options and any arguments supplied to this +# function. If LBS_GPG_PASSPHRASE is set, it will arrange redirections so that +# the passphrase is supplied to gpg on a file descriptor. +run_gpg () { + if [ -n "$LBS_GPG_PASSPHRASE" ]; then + exec 4<&0 + echo "$LBS_GPG_PASSPHRASE" | + gpg "${gpg_options[@]}" --passphrase-fd=3 "$@" 3<&0 <&4 + else + gpg "${gpg_options[@]}" "$@" + fi +} + +case "$1" in + --encrypt) + if [ -n "$LBS_GPG_ENC_KEY" ]; then + gpg_options=("${gpg_options[@]}" --recipient "$LBS_GPG_ENC_KEY") + fi + run_gpg --encrypt + ;; + + --decrypt) + run_gpg + ;; + + --clearsign) + if [ -n "$LBS_GPG_SIGN_KEY" ]; then + gpg_options=("${gpg_options[@]}" --local-user "$LBS_GPG_SIGN_KEY") + fi + run_gpg --clearsign + ;; + + *) + echo "$0: Unknown command or command not specified: $1" 1>&2 + exit 1 + ;; +esac diff --git a/contrib/cumulus-store-s3 b/contrib/cumulus-store-s3 new file mode 100755 index 0000000..340253a --- /dev/null +++ b/contrib/cumulus-store-s3 @@ -0,0 +1,31 @@ +#!/usr/bin/python +# +# Storage hook for writing LBS backups directly to Amazon's Simple Storage +# Service (S3). +# +# Command-line arguments: +# +# Most options are controlled by environment variables: +# AWS_ACCESS_KEY_ID Amazon Web Services credentials +# AWS_SECRET_ACCESS_KEY " " +# LBS_S3_BUCKET S3 bucket in which data should be stored +# LBS_S3_PREFIX Path prefix to add to pathnames (include trailing +# slash) +# +# This script depends upon the boto Python library for interacting with Amazon +# S3. + +import os, sys +import boto +from boto.s3.bucket import Bucket +from boto.s3.key import Key + +prefix = os.environ.get('LBS_S3_PREFIX', "") +bucket_name = os.environ['LBS_S3_BUCKET'] +(local_path, file_type, remote_path) = sys.argv[1:4] + +conn = boto.connect_s3() +bucket = Bucket(conn, bucket_name) +k = Key(bucket) +k.key = prefix + file_type + "/" + remote_path +k.set_contents_from_filename(local_path) diff --git a/contrib/lbs-filter-gpg b/contrib/lbs-filter-gpg deleted file mode 100755 index 010c05f..0000000 --- a/contrib/lbs-filter-gpg +++ /dev/null @@ -1,59 +0,0 @@ -#!/bin/bash -# -# Filter for encrypting/decrypting/signing LBS archives using gpg. -# -# This takes input on stdin and produces output to stdout. It can operate in -# one of several modes, depending upon the command-line argument supplied: -# --encrypt Encrypt the data stream -# --decrypt Decrypt the supplied data -# --clearsign Enclose a text file with a signature -# Options are controlled by various environment variables: -# LBS_GPG_HOME set the gpg home directory (containing keyrings) -# LBS_GPG_ENC_KEY key ID to use encryption -# LBS_GPG_SIGN_KEY key ID to use for signing -# LBS_GPG_PASSPHRASE passphrase to supply to gpg, if needed - -declare -a gpg_options -gpg_options=(--quiet --batch) - -if [ -n "$LBS_GPG_HOME" ]; then - gpg_options=("${gpg_options[@]}" --homedir "$LBS_GPG_HOME") -fi - -# Run gpg with the options in $gpg_options and any arguments supplied to this -# function. If LBS_GPG_PASSPHRASE is set, it will arrange redirections so that -# the passphrase is supplied to gpg on a file descriptor. -run_gpg () { - if [ -n "$LBS_GPG_PASSPHRASE" ]; then - exec 4<&0 - echo "$LBS_GPG_PASSPHRASE" | - gpg "${gpg_options[@]}" --passphrase-fd=3 "$@" 3<&0 <&4 - else - gpg "${gpg_options[@]}" "$@" - fi -} - -case "$1" in - --encrypt) - if [ -n "$LBS_GPG_ENC_KEY" ]; then - gpg_options=("${gpg_options[@]}" --recipient "$LBS_GPG_ENC_KEY") - fi - run_gpg --encrypt - ;; - - --decrypt) - run_gpg - ;; - - --clearsign) - if [ -n "$LBS_GPG_SIGN_KEY" ]; then - gpg_options=("${gpg_options[@]}" --local-user "$LBS_GPG_SIGN_KEY") - fi - run_gpg --clearsign - ;; - - *) - echo "$0: Unknown command or command not specified: $1" 1>&2 - exit 1 - ;; -esac diff --git a/contrib/lbs-store-s3 b/contrib/lbs-store-s3 deleted file mode 100755 index 340253a..0000000 --- a/contrib/lbs-store-s3 +++ /dev/null @@ -1,31 +0,0 @@ -#!/usr/bin/python -# -# Storage hook for writing LBS backups directly to Amazon's Simple Storage -# Service (S3). -# -# Command-line arguments: -# -# Most options are controlled by environment variables: -# AWS_ACCESS_KEY_ID Amazon Web Services credentials -# AWS_SECRET_ACCESS_KEY " " -# LBS_S3_BUCKET S3 bucket in which data should be stored -# LBS_S3_PREFIX Path prefix to add to pathnames (include trailing -# slash) -# -# This script depends upon the boto Python library for interacting with Amazon -# S3. - -import os, sys -import boto -from boto.s3.bucket import Bucket -from boto.s3.key import Key - -prefix = os.environ.get('LBS_S3_PREFIX', "") -bucket_name = os.environ['LBS_S3_BUCKET'] -(local_path, file_type, remote_path) = sys.argv[1:4] - -conn = boto.connect_s3() -bucket = Bucket(conn, bucket_name) -k = Key(bucket) -k.key = prefix + file_type + "/" + remote_path -k.set_contents_from_filename(local_path) diff --git a/doc/design.txt b/doc/design.txt index e515359..87aea26 100644 --- a/doc/design.txt +++ b/doc/design.txt @@ -1,4 +1,4 @@ -This document aims to describe the goals and constraints of the LBS +This document aims to describe the goals and constraints of the Cumulus design. ======================================================================== diff --git a/doc/format.txt b/doc/format.txt index 66c6814..1511115 100644 --- a/doc/format.txt +++ b/doc/format.txt @@ -1,14 +1,22 @@ Backup Format Description - for an LFS-Inspired Backup Solution + for Cumulus: Efficient Filesystem Backup to the Cloud Version: "LBS Snapshot v0.6" -NOTE: This format specification is not yet complete. Right now the code -provides the best documentation of the format. +NOTE: This format specification is intended to be mostly stable, but is +still subject to change before the 1.0 release. The code may provide +additional useful documentation on the format. + +NOTE2: The name of this project has changed from LBS to Cumulus. +However, to avoid introducing gratuitous changes into the format, in +most cases any references to "LBS" in the format description have been +left as-is. The name may be changed in the future if the format is +updated. This document simply describes the snapshot format. It is described from the point of view of a decompressor which wishes to restore the files from a snapshot. It does not specify the exact behavior required -of the backup program writing the snapshot. +of the backup program writing the snapshot. For details of the current +backup program, see implementation.txt. This document does not explain the rationale behind the format; for that, see design.txt. @@ -17,7 +25,7 @@ that, see design.txt. DATA CHECKSUMS ============== -In several places in the LBS format, a cryptographic checksum may be +In several places in the Cumulus format, a cryptographic checksum may be used to allow data integrity to be verified. At the moment, only the SHA-1 checksum is supported, but it is expected that other algorithms will be supported in the future. @@ -41,7 +49,7 @@ A sample checksum string is SEGMENTS & OBJECTS: STORAGE AND NAMING ====================================== -An LBS snapshot consists, at its base, of a collection of /objects/: +A Cumulus snapshot consists, at its base, of a collection of /objects/: binary blobs of data, much like a file. Higher layers interpret the contents of objects in various ways, but the lowest layer is simply concerned with storing and naming these objects. @@ -50,9 +58,9 @@ An object is a sequence of bytes (octets) of arbitrary length. An object may contain as few as zero bytes (though such objects are not very useful). Object sizes are potentially unbounded, but it is recommended that the maximum size of objects produced be on the order of -megabytes. Files of essentially unlimited size can be stored in an LBS -snapshot using objects of modest size, so this should not cause any real -restrictions. +megabytes. Files of essentially unlimited size can be stored in a +Cumulus snapshot using objects of modest size, so this should not cause +any real restrictions. For storage purposes, objects are grouped together into /segments/. Segments use the TAR format; each object within a segment is stored as a @@ -265,7 +273,7 @@ is %Y%m%dT%H%M%S (20070806T092239 means 2007-08-06 09:22:39). The contents of the descriptor are a set of RFC 822-style headers (much like the metadata listing). The fields which are defined are: Format: The string "LBS Snapshot v0.6" which identifies this file as - an LBS backup descriptor. The version number (v0.6) might + a Cumulus backup descriptor. The version number (v0.6) might change if there are changes to the format. It is expected that at some point, once the format is stabilized, the version identifier will be changed to v1.0. diff --git a/doc/implementation.txt b/doc/implementation.txt index 1ba78ac..64c5b6a 100644 --- a/doc/implementation.txt +++ b/doc/implementation.txt @@ -1,4 +1,4 @@ - LBS: An LFS-Inspired Backup Solution + Cumulus: Efficient Filesystem Backup to the Cloud Implementation Overview HIGH-LEVEL OVERVIEW @@ -62,7 +62,7 @@ after doing so will progress much more slowly). localdb.sqlite is an SQLite database file, which is used for indexing objects stored in the snapshot directory and various other purposes. -The database schema is contained in the file schema.sql in the LBS +The database schema is contained in the file schema.sql in the Cumulus source. Among the data tracked by localdb.sqlite: - A list of segments stored in the snapshot directory. This might not