--- /dev/null
+ LBS: An LFS-Inspired Backup Solution
+
+How to Build
+------------
+
+Dependencies:
+ - libuuid
+ - sqlite3
+
+Building should be a simple matter of running "make". This will produce
+an executable called "lbs".
+
+
+Setting up Backups
+------------------
+
+Two directories are needed for backups: one for storing the backup
+snapshots themselves, and one for storing bookkeeping information to go
+with the backups. In this example, the first will be "/lbs", and the
+second "/lbs.db", but any directories will do. Only the first
+directory, /lbs, needs to be stored somewhere safe. The second is only
+used when creating new snapshots, and is not needed when restoring.
+
+ 1. Create the snapshot directory and the local database directory:
+ $ mkdir /lbs /lbs.db
+
+ 2. Initialize the local database using the provided script schema.sql
+ from the source:
+ $ sqlite3 /lbs.db/localdb.sqlite
+ sqlite> .read schema.sql
+ sqlite> .exit
+
+ 3. If encrypting or signing backups with gpg, generate appropriate
+ keypairs. The keys can be kept in a user keyring or in a separate
+ keyring just for backups; this example does the latter.
+ $ mkdir /lbs.db/gpg; chmod 700 /lbs.db/gpg
+ $ gpg --homedir /lbs.db/gpg --gen-key
+ (generate a keypair for encryption; enter a passphrase for
+ the secret key)
+ $ gpg --homedir /lbs.db/gpg --gen-key
+ (generate a second keypair for signing; for automatic
+ signing do not use a passphrase to protect the secret key)
+ Be sure to store the secret key needed for decryption somewhere
+ safe, perhaps with the backup itself (the key protected with an
+ appropriate passphrase). The secret signing key need not be stored
+ with the backups (since in the event of data loss, it probably
+ isn't necessary to create future backups that are signed with the
+ same key).
+
+ To achieve better compression, the ecnryption key can be edited to
+ alter the preferred compression algorithms to list bzip2 before
+ zlib. Run
+ $ gpg --homedir /lbs.db/gpg --edit-key <encryption key>
+ Command> pref
+ (prints a terse listing of preferences associated with the
+ key)
+ Command> setpref
+ (allows preferences to be changed; copy the same preferences
+ list printed out by the previous command, but change the
+ order of the compression algorithms, which start with "Z",
+ to be "Z3 Z2 Z1" which stands for "BZIP2, ZLIB, ZIP")
+ Command> save
+
+ Copy the provided encryption filter program, lbs-filter-gpg,
+ somewhere it may be run from.
+
+ 4. Create a script for launching the LBS backup process. A simple
+ version is:
+
+ #!/bin/sh
+ export LBS_GPG_HOME=/lbs.db/gpg
+ export LBS_GPG_ENC_KEY=<encryption key>
+ export LBS_GPG_SIGN_KEY=<signing key>
+ lbs --dest=/lbs --localdb=/lbs.db
+ --filter="lbs-filter-gpg --encrypt" --filter-extension=.gpg \
+ --signature-filter="lbs-filter-gpg --clearsign" \
+ /etc /home /other/paths/to/store
+
+ Make appropriate substitutions for the key IDs and any relevant
+ paths. If desired, insert an option "--scheme=<name>" to specify a
+ name for this backup scheme which will be included in the snapshot
+ file names (for example, use a name based on the hostname or
+ descriptive of the files backed up).
+
+
+Backup Maintenance
+------------------
+
+Segment cleaning must periodically be done to identify backup segments
+that are mostly unused, but are storing a small amount of useful data.
+Data in these segments will be rewritten into new segments in future
+backups to eliminate the dependence on the almost-empty old segments.
+
+Segment cleaning is currently a mostly manual process. An automatic
+tool for performing segment cleaning will be available in the future.
+
+Old backup snapshots can be pruned from the snapshot directory (/lbs) to
+recover space. Deleting an old backup snapshot is a simlpe matter of
+deleting the appropriate snapshot descriptor file (snapshot-*.lbs) and
+any associated checksums (snapshot-*.sha1sums). Segments used by that
+snapshot, but not any other snapshots, can be identified by running the
+clean-segments.pl script from the /lbs directory--this will perform a
+scan of the current directory to identify unreferenced segments, and
+will print a list to stdout. Assuming the list looks reasonable, the
+segments can be quickly deleted with
+ $ rm `./clean-segments.pl`
+
+The clean-segments.pl script will also print out a warning message if
+any snapshots appear to depend upon segments which are not present; this
+is a serious error which indicates that some of the data needed to
+recover a snapshot appears to be lost.
+
+
+Restoring a Snapshot
+--------------------
+
+The restore.pl script is a simple (proof-of-concept, really) program for
+restoring the contents of an LBS snapshot. Ideally, it should be stored
+with the backup files so it is available if it is needed.
+
+The restore.pl script does not know how to decompress segments, so this
+step must be performed manually. Create a temporary directory for
+holding all decompressed objects. Copy the snapshot descriptor file
+(*.lbs) for the snapshot to be restored to this temporary directory.
+The snapshot descriptor includes a list of all segments which are needed
+for the snapshot. For each of these snapshots, decompress the segment
+file (with gpg or the appropriate program based on whatever filter was
+used), then pipe the resulting data through "tar -xf -" to extract. Do
+this from the temporary directory; the temporary directory should be
+filled with one directory for each segment decompressed.
+
+Run restore.pl giving two arguments: the snapshot descriptor file
+(*.lbs) in the temporary directory, and a directory where the restored
+files should be written.
+
+A better recovery tool will be provided in the future.
--- /dev/null
+ LBS: An LFS-Inspired Backup Solution
+ Implementation Overview
+
+HIGH-LEVEL OVERVIEW
+===================
+
+There are two different classes of data stored, typically in different
+directories:
+
+The SNAPSHOT directory contains the actual backup contents. It consists
+of segment data (typically in compressed/encrypted form, one segment per
+file) as well as various small per-snapshot files such as the snapshot
+descriptor files (which names each snapshot and tells where to locate
+the data for it) and checksum files (which list checksums of segments
+for quick integrity checking). The snapshot directory may be stored on
+a remote server. It is write-only, in the sense that data does not need
+to be read from the snapshot directory to create a new snapshot, and
+files in it are immutable once created (they may be deleted if they are
+no longer needed, but file contents are never changed).
+
+The LOCAL DATABASE contains indexes used during the backup process.
+Files here keep track of what information is known to be stored in the
+snapshot directory, so that new snapshots can appropriate re-use data.
+The local database, as its name implies, should be stored somewhere
+local, since random access (read and write) will be required during the
+backup process. Unlike the snapshot directory, files here are not
+immutable.
+
+Only the data stored in the snapshot directory is required to restore a
+snapshot. The local database does not need to be backed up (stored at
+multiple separate locations, etc.). The contents of the local database
+can be rebuilt (at least in theory) from data in the snapshot directory
+and the local filesystem; it is expected that tools will eventually be
+provided to do so.
+
+The format of data in the snapshot directory is described in format.txt.
+The format of data in the local database is more fluid and may evolve
+over time. The current structure of the local database is described in
+this document.
+
+
+LOCAL DATABASE FORMAT
+=====================
+
+The local database directory currently contains two files:
+localdb.sqlite and a statcache file. (Actually, two types of files. It
+is possible to create snapshots using different schemes, and have them
+share the same local database directory. In this case, there will still
+be one localdb.sqlite file, but one statcache file for each backup
+scheme.)
+
+Each statcache file is a plain text file, with a format similar to the
+file metadata listing used in the snapshot directory. The purpose of
+the statcache file is to speed the backup process by making it possible
+to determine if a file has changed since the previous snapshot by
+comparing the results of a stat() system call with the data in the
+statcache file, and if the file is unchanged, providing the checksum and
+list of data blocks used to previously store the file. The statcache
+file is rewritten each time a snapshot is taken, and can safely be
+deleted (with the only major side effect being that the first backups
+after doing so will progress much more slowly).
+
+localdb.sqlite is an SQLite database file, which is used for indexing
+objects stored in the snapshot directory and various other purposes.
+The database schema is contained in the file schema.sql in the LBS
+source. Among the data tracked by localdb.sqlite:
+
+ - A list of segments stored in the snapshot directory. This might not
+ include all segments (segments belonging to old snapshots might be
+ removed), but for correctness all segments listed in the local
+ database must exist in the snapshot directory.
+
+ - A block index which tracks objects in the snapshot directory used to
+ store file data. It is indexed by block checksum, and so can be
+ used while generating a snapshot to determine if a just-read block
+ of data is already stored in the snapshot directory, and if so how
+ to name it.
+
+ - A list of recent snapshots, together with a list of the objects from
+ the block index they reference.
+
+The localdb SQL database is central to data sharing and segment
+cleaning. When creating a new snapshot, information about the new
+snapshot and the blocks is uses (including any new ones) is written to
+the database. Using the database, separate segment cleaning processes
+can determine how much data in various segments is still live, and
+determine which segments are best candidates for cleaning. Cleaning is
+performed by updating the database to mark objects in the cleaned
+segments as unavailable for use in future snapshots; when the backup
+process next runs, any files that would use these expired blocks instead
+have a copy of the data written to a new segment.