Replace boost::scoped_ptr with std::unique_ptr.

[cumulus.git] / README
diff --git a/README b/README

index eaf33fd..2dec538 100644 (file)
--- a/README
+++ b/README
@@ -1,14 +1,21 @@
-                  LBS: An LFS-Inspired Backup Solution
+           Cumulus: Efficient Filesystem Backup to the Cloud
  
  How to Build
  ------------
  
  Dependencies:
-  - libuuid
+  - libuuid (sometimes part of e2fsprogs)
    - sqlite3
+  - Python (2.7 or later, or 3.2 or later)
+  - Python six, a Python 2/3 compatibility library
+    https://pypi.python.org/pypi/six
+  - boto, the python interface to Amazon's Web Services (for S3 storage)
+    http://code.google.com/p/boto
+  - paramiko, SSH2 protocol for python (for sftp storage)
+    http://www.lag.net/paramiko/
  
  Building should be a simple matter of running "make".  This will produce
-an executable called "lbs".
+an executable called "cumulus".
  
  
  Setting up Backups
@@ -16,28 +23,28 @@ Setting up Backups
  
  Two directories are needed for backups: one for storing the backup
  snapshots themselves, and one for storing bookkeeping information to go
-with the backups.  In this example, the first will be "/lbs", and the
-second "/lbs.db", but any directories will do.  Only the first
-directory, /lbs, needs to be stored somewhere safe.  The second is only
-used when creating new snapshots, and is not needed when restoring.
+with the backups.  In this example, the first will be "/cumulus", and
+the second "/cumulus.db", but any directories will do.  Only the first
+directory, /cumulus, needs to be stored somewhere safe.  The second is
+only used when creating new snapshots, and is not needed when restoring.
  
    1. Create the snapshot directory and the local database directory:
-        $ mkdir /lbs /lbs.db
+        $ mkdir /cumulus /cumulus.db
  
    2. Initialize the local database using the provided script schema.sql
       from the source:
-        $ sqlite3 /lbs.db/localdb.sqlite
+        $ sqlite3 /cumulus.db/localdb.sqlite
          sqlite> .read schema.sql
          sqlite> .exit
  
    3. If encrypting or signing backups with gpg, generate appropriate
       keypairs.  The keys can be kept in a user keyring or in a separate
       keyring just for backups; this example does the latter.
-        $ mkdir /lbs.db/gpg; chmod 700 /lbs.db/gpg
-        $ gpg --homedir /lbs.db/gpg --gen-key
+        $ mkdir /cumulus.db/gpg; chmod 700 /cumulus.db/gpg
+        $ gpg --homedir /cumulus.db/gpg --gen-key
              (generate a keypair for encryption; enter a passphrase for
              the secret key)
-        $ gpg --homedir /lbs.db/gpg --gen-key
+        $ gpg --homedir /cumulus.db/gpg --gen-key
              (generate a second keypair for signing; for automatic
              signing do not use a passphrase to protect the secret key)
       Be sure to store the secret key needed for decryption somewhere
@@ -47,10 +54,10 @@ used when creating new snapshots, and is not needed when restoring.
       isn't necessary to create future backups that are signed with the
       same key).
  
-     To achieve better compression, the ecnryption key can be edited to
+     To achieve better compression, the encryption key can be edited to
       alter the preferred compression algorithms to list bzip2 before
       zlib.  Run
-        $ gpg --homedir /lbs.db/gpg --edit-key <encryption key>
+        $ gpg --homedir /cumulus.db/gpg --edit-key <encryption key>
          Command> pref
              (prints a terse listing of preferences associated with the
              key)
@@ -61,26 +68,27 @@ used when creating new snapshots, and is not needed when restoring.
              to be "Z3 Z2 Z1" which stands for "BZIP2, ZLIB, ZIP")
          Command> save
  
-    Copy the provided encryption filter program, lbs-filter-gpg,
+    Copy the provided encryption filter program, cumulus-filter-gpg,
      somewhere it may be run from.
  
-  4. Create a script for launching the LBS backup process.  A simple
+  4. Create a script for launching the Cumulus backup process.  A simple
       version is:
  
          #!/bin/sh
-        export LBS_GPG_HOME=/lbs.db/gpg
+        export LBS_GPG_HOME=/cumulus.db/gpg
          export LBS_GPG_ENC_KEY=<encryption key>
          export LBS_GPG_SIGN_KEY=<signing key>
-        lbs --dest=/lbs --localdb=/lbs.db
-            --filter="lbs-filter-gpg --encrypt" --filter-extension=.gpg \
-            --signature-filter="lbs-filter-gpg --clearsign" \
+        cumulus --dest=/cumulus --localdb=/cumulus.db --scheme=test \
+            --filter="cumulus-filter-gpg --encrypt" --filter-extension=.gpg \
+            --signature-filter="cumulus-filter-gpg --clearsign" \
              /etc /home /other/paths/to/store
  
      Make appropriate substitutions for the key IDs and any relevant
-    paths.  If desired, insert an option "--scheme=<name>" to specify a
-    name for this backup scheme which will be included in the snapshot
-    file names (for example, use a name based on the hostname or
-    descriptive of the files backed up).
+    paths.  Here "--scheme=test" gives a descriptive name ("test") to
+    this collection of snapshots.  It is possible to store multiple sets
+    of backups in the same directory, using different scheme names to
+    distinguish them.  The --scheme option can also be left out
+    entirely.
  
  
  Backup Maintenance
@@ -91,19 +99,25 @@ that are mostly unused, but are storing a small amount of useful data.
  Data in these segments will be rewritten into new segments in future
  backups to eliminate the dependence on the almost-empty old segments.
  
-Segment cleaning is currently a mostly manual process.  An automatic
-tool for performing segment cleaning will be available in the future.
-
-Old backup snapshots can be pruned from the snapshot directory (/lbs) to
-recover space.  Deleting an old backup snapshot is a simlpe matter of
-deleting the appropriate snapshot descriptor file (snapshot-*.lbs) and
-any associated checksums (snapshot-*.sha1sums).  Segments used by that
-snapshot, but not any other snapshots, can be identified by running the
-clean-segments.pl script from the /lbs directory--this will perform a
-scan of the current directory to identify unreferenced segments, and
-will print a list to stdout.  Assuming the list looks reasonable, the
-segments can be quickly deleted with
+The provided cumulus-util tool can perform the necessary cleaning.  Run
+it with
+    $ cumulus-util --localdb=/cumulus.db clean
+Cleaning is still under development, and so may be improved in the
+future, but this version is intended to be functional.
+
+Old backup snapshots can be pruned from the snapshot directory
+(/cumulus) to recover space.  A snapshot which is still referenced by
+the local database should not be deleted, however.  Deleting an old
+backup snapshot is a simple matter of deleting the appropriate snapshot
+descriptor file (snapshot-*.lbs) and any associated checksums
+(snapshot-*.sha1sums).  Segments used by that snapshot, but not any
+other snapshots, can be identified by running the clean-segments.pl
+script from the /cumulus directory--this will perform a scan of the
+current directory to identify unreferenced segments, and will print a
+list to stdout.  Assuming the list looks reasonable, the segments can be
+quickly deleted with
      $ rm `./clean-segments.pl`
+A tool to make this easier will be implemented later.
  
  The clean-segments.pl script will also print out a warning message if
  any snapshots appear to depend upon segments which are not present; this
@@ -111,12 +125,61 @@ is a serious error which indicates that some of the data needed to
  recover a snapshot appears to be lost.
  
  
-Restoring a Snapshot
---------------------
-
-The restore.pl script is a simple (proof-of-concept, really) program for
-restoring the contents of an LBS snapshot.  Ideally, it should be stored
-with the backup files so it is available if it is needed.
+Listing and Restoring Snapshots
+-------------------------------
+
+A listing of all currently-stored snapshots (and their sizes) can be
+produced with
+    $ cumulus-util --store=/cumulus list-snapshot-sizes
+
+If data from a snapshot needs to be restored, this can be done with
+    $ cumulus-util --store=/cumulus restore-snapshot \
+        test-20080101T121500 /dest/dir <files...>
+Here, "test-20080101T121500" is the name of the snapshot (consisting of
+the scheme name and a timestamp; this can be found from the output of
+list-snapshot-sizes) and "/dest/dir" is the path under which files
+should be restored (this directory should initially be empty).
+"<files...>" is a list of files or directories to restore.  If none are
+specified, the entire snapshot is restored.
+
+
+Remote Backups
+--------------
+
+The cumulus-util command can operate directly on remote backups.  The
+--store parameter accepts, in addition to a raw disk path, a URL.
+Supported URL forms are
+    file:///path        Equivalent to /path
+    s3://bucket/path    Storage in Amazon S3
+        (Expects the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
+        environment variables to be set appropriately)
+    sftp://server/path  Storage on sftp server
+        (note that no password authentication or password protected
+        authorization keys are not supported atm and config options
+        like port or individual authorization keys are to be
+        configured in ~/.ssh/config and the public key of the
+        server has to be in ~/.ssh/known_hosts)
+
+To copy backup snapshots from one storage area to another, the
+cumulus-sync command can be used, as in
+    $ cumulus-sync file:///cumulus s3://my-bucket/cumulus
+
+Support for directly writing backups to a remote location (without using
+a local staging directory and cumulus-sync) is slightly more
+experimental, but can be achieved by replacing
+    --dest=/cumulus
+with
+    --upload-script="cumulus-store s3://my-bucket/cumulus"
+
+
+Alternate Restore Tool
+----------------------
+
+The contrib/restore.pl script is a simple program for restoring the
+contents of a Cumulus snapshot.  It is not as full-featured as the
+restore functionality in cumulus-util, but it is far more compact.  It
+could be stored with the backup files so a tool for restores is
+available even if all other data is lost.
  
  The restore.pl script does not know how to decompress segments, so this
  step must be performed manually.  Create a temporary directory for
@@ -132,5 +195,3 @@ filled with one directory for each segment decompressed.
  Run restore.pl giving two arguments: the snapshot descriptor file
  (*.lbs) in the temporary directory, and a directory where the restored
  files should be written.
-
-A better recovery tool will be provided in the future.