From: Michael Vrable Date: Fri, 24 Aug 2007 17:24:09 +0000 (-0700) Subject: Documentation improvements. X-Git-Url: https://git.vrable.net/?a=commitdiff_plain;h=5260c952146895e6fd522777a41c48db84535964;p=cumulus.git Documentation improvements. Highlights are a README file with instructions for getting started, and description of some implementation details, starting with the purpose and format of the local database. --- diff --git a/README b/README new file mode 100644 index 0000000..eaf33fd --- /dev/null +++ b/README @@ -0,0 +1,136 @@ + LBS: An LFS-Inspired Backup Solution + +How to Build +------------ + +Dependencies: + - libuuid + - sqlite3 + +Building should be a simple matter of running "make". This will produce +an executable called "lbs". + + +Setting up Backups +------------------ + +Two directories are needed for backups: one for storing the backup +snapshots themselves, and one for storing bookkeeping information to go +with the backups. In this example, the first will be "/lbs", and the +second "/lbs.db", but any directories will do. Only the first +directory, /lbs, needs to be stored somewhere safe. The second is only +used when creating new snapshots, and is not needed when restoring. + + 1. Create the snapshot directory and the local database directory: + $ mkdir /lbs /lbs.db + + 2. Initialize the local database using the provided script schema.sql + from the source: + $ sqlite3 /lbs.db/localdb.sqlite + sqlite> .read schema.sql + sqlite> .exit + + 3. If encrypting or signing backups with gpg, generate appropriate + keypairs. The keys can be kept in a user keyring or in a separate + keyring just for backups; this example does the latter. + $ mkdir /lbs.db/gpg; chmod 700 /lbs.db/gpg + $ gpg --homedir /lbs.db/gpg --gen-key + (generate a keypair for encryption; enter a passphrase for + the secret key) + $ gpg --homedir /lbs.db/gpg --gen-key + (generate a second keypair for signing; for automatic + signing do not use a passphrase to protect the secret key) + Be sure to store the secret key needed for decryption somewhere + safe, perhaps with the backup itself (the key protected with an + appropriate passphrase). The secret signing key need not be stored + with the backups (since in the event of data loss, it probably + isn't necessary to create future backups that are signed with the + same key). + + To achieve better compression, the ecnryption key can be edited to + alter the preferred compression algorithms to list bzip2 before + zlib. Run + $ gpg --homedir /lbs.db/gpg --edit-key + Command> pref + (prints a terse listing of preferences associated with the + key) + Command> setpref + (allows preferences to be changed; copy the same preferences + list printed out by the previous command, but change the + order of the compression algorithms, which start with "Z", + to be "Z3 Z2 Z1" which stands for "BZIP2, ZLIB, ZIP") + Command> save + + Copy the provided encryption filter program, lbs-filter-gpg, + somewhere it may be run from. + + 4. Create a script for launching the LBS backup process. A simple + version is: + + #!/bin/sh + export LBS_GPG_HOME=/lbs.db/gpg + export LBS_GPG_ENC_KEY= + export LBS_GPG_SIGN_KEY= + lbs --dest=/lbs --localdb=/lbs.db + --filter="lbs-filter-gpg --encrypt" --filter-extension=.gpg \ + --signature-filter="lbs-filter-gpg --clearsign" \ + /etc /home /other/paths/to/store + + Make appropriate substitutions for the key IDs and any relevant + paths. If desired, insert an option "--scheme=" to specify a + name for this backup scheme which will be included in the snapshot + file names (for example, use a name based on the hostname or + descriptive of the files backed up). + + +Backup Maintenance +------------------ + +Segment cleaning must periodically be done to identify backup segments +that are mostly unused, but are storing a small amount of useful data. +Data in these segments will be rewritten into new segments in future +backups to eliminate the dependence on the almost-empty old segments. + +Segment cleaning is currently a mostly manual process. An automatic +tool for performing segment cleaning will be available in the future. + +Old backup snapshots can be pruned from the snapshot directory (/lbs) to +recover space. Deleting an old backup snapshot is a simlpe matter of +deleting the appropriate snapshot descriptor file (snapshot-*.lbs) and +any associated checksums (snapshot-*.sha1sums). Segments used by that +snapshot, but not any other snapshots, can be identified by running the +clean-segments.pl script from the /lbs directory--this will perform a +scan of the current directory to identify unreferenced segments, and +will print a list to stdout. Assuming the list looks reasonable, the +segments can be quickly deleted with + $ rm `./clean-segments.pl` + +The clean-segments.pl script will also print out a warning message if +any snapshots appear to depend upon segments which are not present; this +is a serious error which indicates that some of the data needed to +recover a snapshot appears to be lost. + + +Restoring a Snapshot +-------------------- + +The restore.pl script is a simple (proof-of-concept, really) program for +restoring the contents of an LBS snapshot. Ideally, it should be stored +with the backup files so it is available if it is needed. + +The restore.pl script does not know how to decompress segments, so this +step must be performed manually. Create a temporary directory for +holding all decompressed objects. Copy the snapshot descriptor file +(*.lbs) for the snapshot to be restored to this temporary directory. +The snapshot descriptor includes a list of all segments which are needed +for the snapshot. For each of these snapshots, decompress the segment +file (with gpg or the appropriate program based on whatever filter was +used), then pipe the resulting data through "tar -xf -" to extract. Do +this from the temporary directory; the temporary directory should be +filled with one directory for each segment decompressed. + +Run restore.pl giving two arguments: the snapshot descriptor file +(*.lbs) in the temporary directory, and a directory where the restored +files should be written. + +A better recovery tool will be provided in the future. diff --git a/format.txt b/format.txt index 54d9f8c..78f32c3 100644 --- a/format.txt +++ b/format.txt @@ -248,3 +248,7 @@ like the metadata listing). The fields which are defined are: completely reconstruct a snapshot, etc. Root: A single object reference which points to the metadata listing for the snapshot. + Checksums: A checksum file may be produced (with the same name as + the snapshot descriptor file, but with extension .sha1sums + instead of .lbs) containing SHA-1 checksums of all segments. + This field contains a checksum of that file. diff --git a/implementation.txt b/implementation.txt new file mode 100644 index 0000000..1ba78ac --- /dev/null +++ b/implementation.txt @@ -0,0 +1,91 @@ + LBS: An LFS-Inspired Backup Solution + Implementation Overview + +HIGH-LEVEL OVERVIEW +=================== + +There are two different classes of data stored, typically in different +directories: + +The SNAPSHOT directory contains the actual backup contents. It consists +of segment data (typically in compressed/encrypted form, one segment per +file) as well as various small per-snapshot files such as the snapshot +descriptor files (which names each snapshot and tells where to locate +the data for it) and checksum files (which list checksums of segments +for quick integrity checking). The snapshot directory may be stored on +a remote server. It is write-only, in the sense that data does not need +to be read from the snapshot directory to create a new snapshot, and +files in it are immutable once created (they may be deleted if they are +no longer needed, but file contents are never changed). + +The LOCAL DATABASE contains indexes used during the backup process. +Files here keep track of what information is known to be stored in the +snapshot directory, so that new snapshots can appropriate re-use data. +The local database, as its name implies, should be stored somewhere +local, since random access (read and write) will be required during the +backup process. Unlike the snapshot directory, files here are not +immutable. + +Only the data stored in the snapshot directory is required to restore a +snapshot. The local database does not need to be backed up (stored at +multiple separate locations, etc.). The contents of the local database +can be rebuilt (at least in theory) from data in the snapshot directory +and the local filesystem; it is expected that tools will eventually be +provided to do so. + +The format of data in the snapshot directory is described in format.txt. +The format of data in the local database is more fluid and may evolve +over time. The current structure of the local database is described in +this document. + + +LOCAL DATABASE FORMAT +===================== + +The local database directory currently contains two files: +localdb.sqlite and a statcache file. (Actually, two types of files. It +is possible to create snapshots using different schemes, and have them +share the same local database directory. In this case, there will still +be one localdb.sqlite file, but one statcache file for each backup +scheme.) + +Each statcache file is a plain text file, with a format similar to the +file metadata listing used in the snapshot directory. The purpose of +the statcache file is to speed the backup process by making it possible +to determine if a file has changed since the previous snapshot by +comparing the results of a stat() system call with the data in the +statcache file, and if the file is unchanged, providing the checksum and +list of data blocks used to previously store the file. The statcache +file is rewritten each time a snapshot is taken, and can safely be +deleted (with the only major side effect being that the first backups +after doing so will progress much more slowly). + +localdb.sqlite is an SQLite database file, which is used for indexing +objects stored in the snapshot directory and various other purposes. +The database schema is contained in the file schema.sql in the LBS +source. Among the data tracked by localdb.sqlite: + + - A list of segments stored in the snapshot directory. This might not + include all segments (segments belonging to old snapshots might be + removed), but for correctness all segments listed in the local + database must exist in the snapshot directory. + + - A block index which tracks objects in the snapshot directory used to + store file data. It is indexed by block checksum, and so can be + used while generating a snapshot to determine if a just-read block + of data is already stored in the snapshot directory, and if so how + to name it. + + - A list of recent snapshots, together with a list of the objects from + the block index they reference. + +The localdb SQL database is central to data sharing and segment +cleaning. When creating a new snapshot, information about the new +snapshot and the blocks is uses (including any new ones) is written to +the database. Using the database, separate segment cleaning processes +can determine how much data in various segments is still live, and +determine which segments are best candidates for cleaning. Cleaning is +performed by updating the database to mark objects in the cleaned +segments as unavailable for use in future snapshots; when the backup +process next runs, any files that would use these expired blocks instead +have a copy of the data written to a new segment.