README

   1            Cumulus: Efficient Filesystem Backup to the Cloud
   2
   3 How to Build
   4 ------------
   5
   6 Dependencies:
   7   - libuuid (sometimes part of e2fsprogs)
   8   - sqlite3
   9
  10 Building should be a simple matter of running "make".  This will produce
  11 an executable called "cumulus".
  12
  13
  14 Setting up Backups
  15 ------------------
  16
  17 Two directories are needed for backups: one for storing the backup
  18 snapshots themselves, and one for storing bookkeeping information to go
  19 with the backups.  In this example, the first will be "/cumulus", and
  20 the second "/cumulus.db", but any directories will do.  Only the first
  21 directory, /cumulus, needs to be stored somewhere safe.  The second is
  22 only used when creating new snapshots, and is not needed when restoring.
  23
  24   1. Create the snapshot directory and the local database directory:
  25         $ mkdir /cumulus /cumulus.db
  26
  27   2. Initialize the local database using the provided script schema.sql
  28      from the source:
  29         $ sqlite3 /cumulus.db/localdb.sqlite
  30         sqlite> .read schema.sql
  31         sqlite> .exit
  32
  33   3. If encrypting or signing backups with gpg, generate appropriate
  34      keypairs.  The keys can be kept in a user keyring or in a separate
  35      keyring just for backups; this example does the latter.
  36         $ mkdir /cumulus.db/gpg; chmod 700 /cumulus.db/gpg
  37         $ gpg --homedir /cumulus.db/gpg --gen-key
  38             (generate a keypair for encryption; enter a passphrase for
  39             the secret key)
  40         $ gpg --homedir /cumulus.db/gpg --gen-key
  41             (generate a second keypair for signing; for automatic
  42             signing do not use a passphrase to protect the secret key)
  43      Be sure to store the secret key needed for decryption somewhere
  44      safe, perhaps with the backup itself (the key protected with an
  45      appropriate passphrase).  The secret signing key need not be stored
  46      with the backups (since in the event of data loss, it probably
  47      isn't necessary to create future backups that are signed with the
  48      same key).
  49
  50      To achieve better compression, the encryption key can be edited to
  51      alter the preferred compression algorithms to list bzip2 before
  52      zlib.  Run
  53         $ gpg --homedir /cumulus.db/gpg --edit-key <encryption key>
  54         Command> pref
  55             (prints a terse listing of preferences associated with the
  56             key)
  57         Command> setpref
  58             (allows preferences to be changed; copy the same preferences
  59             list printed out by the previous command, but change the
  60             order of the compression algorithms, which start with "Z",
  61             to be "Z3 Z2 Z1" which stands for "BZIP2, ZLIB, ZIP")
  62         Command> save
  63
  64     Copy the provided encryption filter program, cumulus-filter-gpg,
  65     somewhere it may be run from.
  66
  67   4. Create a script for launching the LBS backup process.  A simple
  68      version is:
  69
  70         #!/bin/sh
  71         export LBS_GPG_HOME=/cumulus.db/gpg
  72         export LBS_GPG_ENC_KEY=<encryption key>
  73         export LBS_GPG_SIGN_KEY=<signing key>
  74         cumulus --dest=/cumulus --localdb=/cumulus.db
  75             --filter="cumulus-filter-gpg --encrypt" --filter-extension=.gpg \
  76             --signature-filter="cumulus-filter-gpg --clearsign" \
  77             /etc /home /other/paths/to/store
  78
  79     Make appropriate substitutions for the key IDs and any relevant
  80     paths.  If desired, insert an option "--scheme=<name>" to specify a
  81     name for this backup scheme which will be included in the snapshot
  82     file names (for example, use a name based on the hostname or
  83     descriptive of the files backed up).
  84
  85
  86 Backup Maintenance
  87 ------------------
  88
  89 Segment cleaning must periodically be done to identify backup segments
  90 that are mostly unused, but are storing a small amount of useful data.
  91 Data in these segments will be rewritten into new segments in future
  92 backups to eliminate the dependence on the almost-empty old segments.
  93
  94 The provided cumulus-util tool can perform the necessary cleaning.  Run
  95 it with
  96     $ cumulus-util --localdb=/cumulus.db clean
  97 Cleaning is still under development, and so may be improved in the
  98 future, but this version is intended to be functional.
  99
 100 Old backup snapshots can be pruned from the snapshot directory
 101 (/cumulus) to recover space.  A snapshot which is still referenced by
 102 the local database should not be deleted, however.  Deleting an old
 103 backup snapshot is a simple matter of deleting the appropriate snapshot
 104 descriptor file (snapshot-*.lbs) and any associated checksums
 105 (snapshot-*.sha1sums).  Segments used by that snapshot, but not any
 106 other snapshots, can be identified by running the clean-segments.pl
 107 script from the /cumulus directory--this will perform a scan of the
 108 current directory to identify unreferenced segments, and will print a
 109 list to stdout.  Assuming the list looks reasonable, the segments can be
 110 quickly deleted with
 111     $ rm `./clean-segments.pl`
 112 A tool to make this easier will be implemented later.
 113
 114 The clean-segments.pl script will also print out a warning message if
 115 any snapshots appear to depend upon segments which are not present; this
 116 is a serious error which indicates that some of the data needed to
 117 recover a snapshot appears to be lost.
 118
 119
 120 Restoring a Snapshot
 121 --------------------
 122
 123 The contrib/restore.pl script is a simple (proof-of-concept, really)
 124 program for restoring the contents of an LBS snapshot.  Ideally, it
 125 should be stored with the backup files so it is available if it is
 126 needed.
 127
 128 The restore.pl script does not know how to decompress segments, so this
 129 step must be performed manually.  Create a temporary directory for
 130 holding all decompressed objects.  Copy the snapshot descriptor file
 131 (*.lbs) for the snapshot to be restored to this temporary directory.
 132 The snapshot descriptor includes a list of all segments which are needed
 133 for the snapshot.  For each of these snapshots, decompress the segment
 134 file (with gpg or the appropriate program based on whatever filter was
 135 used), then pipe the resulting data through "tar -xf -" to extract.  Do
 136 this from the temporary directory; the temporary directory should be
 137 filled with one directory for each segment decompressed.
 138
 139 Run restore.pl giving two arguments: the snapshot descriptor file
 140 (*.lbs) in the temporary directory, and a directory where the restored
 141 files should be written.
 142
 143 The cumulus-util program also now has some support for restoring
 144 snapsots (documentation coming soon).