doc/format.txt

   1                        Backup Format Description
   2          for Cumulus: Efficient Filesystem Backup to the Cloud
   3                    Version: "Cumulus Snapshot v0.11"
   4
   5 NOTE: This format specification is intended to be mostly stable, but is
   6 still subject to change before the 1.0 release.  The code may provide
   7 additional useful documentation on the format.
   8
   9 NOTE2: The name of this project has changed from LBS to Cumulus.  In
  10 some areas the name "LBS" is still used.
  11
  12 This document simply describes the snapshot format.  It is described
  13 from the point of view of a decompressor which wishes to restore the
  14 files from a snapshot.  It does not specify the exact behavior required
  15 of the backup program writing the snapshot.  For details of the current
  16 backup program, see implementation.txt.
  17
  18 This document does not explain the rationale behind the format; for
  19 that, see design.txt.
  20
  21
  22 BACKUP REPOSITORY LAYOUT
  23 ========================
  24
  25 Cumulus backups are stored using a relatively simple layout.  Data files
  26 described below are written into one of several directories on the
  27 backup server, depending on their purpose:
  28     snapshots/
  29         Snapshot descriptor files, which quickly summarize each backup
  30         snapshot stored.
  31     segments0/
  32     segments1/
  33         Storage of the bulk of the backup data, in compressed/encrypted
  34         form.  Technically any segment could be stored in either
  35         directory (both directories will be searched when looking for a
  36         segment).  However, data in segments0 might be faster to access
  37         (but more expensive) depending on the storage backend.  The
  38         intent is that segments0 can store filesystem tree metadata and
  39         segments1 can store file contents.
  40     meta/
  41         Snapshot-specific metadata that is not core to the backup.  This
  42         can include checksums of segments, some data for rebuilding
  43         local database contents, etc.
  44
  45
  46 DATA CHECKSUMS
  47 ==============
  48
  49 In several places in the Cumulus format, a cryptographic checksum may be
  50 used to allow data integrity to be verified.  At the moment, only the
  51 SHA-1 checksum is supported, but it is expected that other algorithms
  52 will be supported in the future.
  53
  54 When a checksum is called for, the checksum is always stored in a text
  55 format.  The general format used is
  56     <algorithm>=<hexdigits>
  57
  58 <algorithm> identifies the checksum algorithm used, and allows new
  59 algorithms to be added later.  Permissible values are:
  60     "sha1": SHA-1
  61     "sha224": SHA-224 (added in version 0.11)
  62     "sha256": SHA-256 (added in version 0.11)
  63
  64 <hexdigits> is a sequence of hexadecimal digits which encode the
  65 checksum value.  For sha1, <hexdigits> should be precisely 40 digits
  66 long.
  67
  68 A sample checksum string is
  69     sha1=67049e7931ad7db37b5c794d6ad146c82e5f3187
  70
  71
  72 SEGMENTS & OBJECTS: STORAGE AND NAMING
  73 ======================================
  74
  75 A Cumulus snapshot consists, at its base, of a collection of /objects/:
  76 binary blobs of data, much like a file.  Higher layers interpret the
  77 contents of objects in various ways, but the lowest layer is simply
  78 concerned with storing and naming these objects.
  79
  80 An object is a sequence of bytes (octets) of arbitrary length.  An
  81 object may contain as few as zero bytes (though such objects are not
  82 very useful).  Object sizes are potentially unbounded, but it is
  83 recommended that the maximum size of objects produced be on the order of
  84 megabytes.  Files of essentially unlimited size can be stored in a
  85 Cumulus snapshot using objects of modest size, so this should not cause
  86 any real restrictions.
  87
  88 For storage purposes, objects are grouped together into /segments/.
  89 Segments use the TAR format; each object within a segment is stored as a
  90 separate file.  Segments are named using UUIDs (Universally Unique
  91 Identifiers), which are 128-bit numbers.  The textual form of a UUID is
  92 a sequence of lowercase hexadecimal digits with hyphens inserted at
  93 fixed points; an example UUID is
  94     a704eeae-97f2-4f30-91a4-d4473956366b
  95 This segment could be stored in the filesystem as a file
  96     a704eeae-97f2-4f30-91a4-d4473956366b.tar
  97 The UUID used to name a segment is assigned when the segment is created.
  98 These files are stored in either the segments0 or segments1 directories
  99 on the backup server.
 100
 101 Filters can be layered on top of the segment storage to provide
 102 compression, encryption, or other features.  For example, the example
 103 segment above might be stored as
 104     a704eeae-97f2-4f30-91a4-d4473956366b.tar.bz2
 105 or
 106     a704eeae-97f2-4f30-91a4-d4473956366b.tar.gpg
 107 if the file data had been filtered through bzip2 or gpg, respectively,
 108 before storage.  Filtering of segment data is outside the scope of this
 109 format specification, however; it is assumed that if filtering is used,
 110 when decompressing the unfiltered data can be recovered (yielding data
 111 in the TAR format).
 112
 113 Objects within a segment are numbered sequentially.  This sequence
 114 number is then formatted as an 8-digit (zero-padded) hexadecimal
 115 (lowercase) value.  The fully qualified name of an object consists of
 116 the segment name, followed by a slash ("/"), followed by the object
 117 sequence number.  So, for example
 118     a704eeae-97f2-4f30-91a4-d4473956366b/000001ad
 119 names an object.
 120
 121 Within the segment TAR file, the filename used for each object is its
 122 fully-qualified name.  Thus, when extracted using the standard tar
 123 utility, a segment will produce a directory with the same name as the
 124 segment itself, and that directory will contain a set of
 125 sequentially-numbered files each storing the contents of a single
 126 object.
 127
 128 NOTE: When naming an object, the segment portion consists of the UUID
 129 only.  Any extensions appended to the segment when storing it as a file
 130 in the filesystem (for example, .tar.bz2) and path information (for
 131 example, segments0) are _not_ part of the name of the object.
 132
 133 There are two additional components which may appear in an object name;
 134 both are optional.
 135
 136 First, a checksum may be added to the object name to express an
 137 integrity constraint: the referred-to data must match the checksum
 138 given.  A checksum is enclosed in parentheses and appended to the object
 139 name:
 140     a704eeae-97f2-4f30-91a4-d4473956366b/000001ad(sha1=67049e7931ad7db37b5c794d6ad146c82e5f3187)
 141
 142 Secondly, an object may be /sliced/: a subset of the bytes actually
 143 stored in the object may be selected to be returned.  The slice syntax
 144 is
 145     [<start>+<length>]
 146 where <start> is the first byte to return (as a decimal offset) and
 147 <length> specifies the number of bytes to return (again in decimal).  It
 148 is invalid to select using the slice syntax a range of bytes that does
 149 not fall within the original object.  The slice specification should be
 150 appended to an object name, for example:
 151     a704eeae-97f2-4f30-91a4-d4473956366b/000001ad[264+1000]
 152 selects only bytes 264..1263 from the original object.
 153
 154 The slice syntax
 155     [<length>]
 156 indicates that all bytes of the object are to be used, but
 157 additionally asserts that the referenced object is exactly <length>
 158 bytes long.  Older versions of Cumulus can also use the syntax
 159     [=<length>]
 160 as a synonym for length assertions, but this notation is deprecated.
 161
 162 (In older versions of the format, the syntax [<length>] was a shorthand
 163 for [0+<length>]: that is, select the first <length> bytes of the object
 164 but make no assertions about the overall size.  The backup tool has not
 165 generated such slices since v0.8.)
 166
 167 Both a checksum and a slice can be used.  In this case, the checksum is
 168 given first, followed by the slice.  The checksum is computed over the
 169 original object contents, before slicing.
 170
 171 Special Objects
 172 ---------------
 173
 174 In addition to the standard syntax for objects described above, the
 175 special name "zero" may be used instead of segment/sequence number.
 176 This represents an object consisting entirely of zeroes.  The zero
 177 object must have a slice specification appended to indicate the size of
 178 the object.  For example
 179     zero[1024]
 180 represents a block consisting of 1024 null bytes.  A checksum should not
 181 be given.
 182
 183
 184 FILE METADATA LISTING
 185 =====================
 186
 187 A snapshot stores two distinct types of data into the object store
 188 described above: data and metadata.  Data for a file may be stored as a
 189 single object, or the data may be broken apart into blocks which are
 190 stored as separate objects.  The file /metadata/ log (which may be
 191 spread across multiple objects) specifies the names of the files in a
 192 snapshot, metadata about them such as ownership and timestamps, and
 193 gives the list of objects that contain the data for the file.
 194
 195 The metadata log consists of a set of stanzas, each of which are
 196 formatted somewhat like RFC 822 (email) headers.  An example is:
 197
 198     name: etc/fstab
 199     checksum: sha1=11bd6ec140e4ec3110a91e1dd0f02b63b701421f
 200     data: 2f46bce9-4554-4a60-a4a2-543637bd3989/000001f7
 201     group: 0 (root)
 202     mode: 0644
 203     mtime: 1177977313
 204     size: 867
 205     type: -
 206     user: 0 (root)
 207
 208 The meanings of all the fields are described later.  A blank line
 209 separates stanzas with information about different files.  In addition
 210 to regular stanzas, the metadata listing may contain a line containing
 211 an object reference prefixed with "@".  Such a line indicates that the
 212 contents of the referenced object should be fetched and parsed as a
 213 metadata listing at this point, prior to continuing to parse the current
 214 object.
 215
 216 Several common encodings are used for various fields.  The encoding used
 217 for each field is specified in the field listing that follows.
 218     encoded string: An arbitrary string (octet sequence), with bytes
 219         optionally escaped by replacing a byte with %xx, where "xx" is a
 220         hexadecimal representation of the byte replaced.  For example,
 221         space can be replaced with "%20".  This is the same escaping
 222         mechanism as used in URLs.
 223     integer: An integer, which may be written in decimal, octal, or
 224         hexadecimal.  Strings starting with 0 are interpreted as octal,
 225         and those starting with 0x are intepreted as hexadecimal.
 226
 227 Common fields (required in all stanzas):
 228     path [encoded string]: Full path of the file archived.  Note: In
 229         previous versions (<= 0.2) the name of this field was "name".
 230     user [special]: The user ID of the file, as an integer, optionally
 231         followed by a space and the corresponding username, as an
 232         escaped string enclosed in parentheses.
 233     group [special]: The group ID which owns the file.  Encoding is the
 234         same as for the user field: an integer, with an optional name in
 235         parentheses following.
 236     mode [integer]: Unix mode bits for the file.
 237     type [special]: A single character which indicates the type of file.
 238         The type indicators are meant to be consistent with the
 239         characters used with the -type option to find(1), and the file
 240         type checks in test(1):
 241             f   regular file
 242             b   block device
 243             c   character device
 244             d   directory
 245             l   symlink
 246             p   pipe
 247             s   socket
 248         Note that previous versions used '-' to indicate a regular file.
 249         This character should not be generated in any new snapshots, but
 250         may be encountered in old snapshots (those with a format version
 251         <= 0.2).
 252     mtime [integer]: Modification time of the file.
 253
 254 Optional common fields:
 255     links [integer]: Number of hard links to this file, generally only
 256         reported if greater than 1.
 257     inode [string]: String specifying the inode number of this file when
 258         it was dumped.  If "links" is greater than 1, then searching for
 259         other files that have an identical "inode" value can be used to
 260         determine which files should be hard-linked together when
 261         restoring.  The inode field should be treated as an opaque
 262         string and compared for equality as such; an implementation may
 263         choose whatever representation is convenient.  The format
 264         produced by the standard tool is <major>/<minor>/<inode> (where
 265         <major> and <minor> specify the device of the containing
 266         filesystem and <inode> is the inode number of the file).
 267     ctime [integer]: Change time for the inode.
 268
 269 Special fields used for regular files:
 270     checksum [string]: Checksum of the file contents.
 271     size [integer]: Size of the file, in bytes.
 272     data [reference list]: Whitespace-separated list of object
 273         references.  The referenced data, when concatenated in the
 274         listed order, will reconstruct the file data.  Any reference
 275         that begins with a "@" character is an indirect reference--the
 276         given object includes a whitespace-separated list of object
 277         references which should be parsed in the same manner as the data
 278         field.
 279
 280 Special fields used for symbolic links:
 281     target[encoded string]: The target of the symlink, as returned by
 282         readlink(2).  Note: In old version of the format (<= 0.2), this
 283         field was called "contents" instead of "target".
 284
 285 Special fields used for block and character device files:
 286     device[special]: The major and minor number of the device.  Encoded
 287         as "major/minor", where major is the major device number encoded
 288         into an integer, and minor is the minor device number.
 289
 290
 291 SNAPSHOT DESCRIPTOR
 292 ===================
 293
 294 The snapshot descriptor is a small file which describes a single
 295 snapshot.  It is one of the few files which is not stored as an object
 296 in the segment store.  It is stored as a separate file, in plain text,
 297 but in the same directory as segments are stored.
 298
 299 The name of snapshot descriptor file is
 300     snapshot-<scheme>-<timestamp>.lbs
 301 <scheme> is a descriptive text which can be used to distinguish several
 302 logically distinct sets of snapshots (such as snapshots for two
 303 different directory trees) that are being stored in the same location.
 304 <timestamp> gives the date and time the snapshot was taken; the format
 305 is %Y%m%dT%H%M%S (20070806T092239 means 2007-08-06 09:22:39).  It is
 306 recommended that the timestamp be given in UTC for consistent sorting
 307 even if the offset from UTC to local time changes, however the
 308 authoritative timestamp (including timezone) can be found in the Date
 309 field.  (In version v0.10 and earlier the timestamp is given in local
 310 time; in current versions UTC is used.)
 311
 312 The contents of the descriptor are a set of RFC 822-style headers (much
 313 like the metadata listing).  The fields which are defined are:
 314     Format: The string "Cumulus Snapshot v0.11" which identifies this
 315         file as a Cumulus backup descriptor.  The version number (v0.11)
 316         might change if there are changes to the format.  It is expected
 317         that at some point, once the format is stabilized, the version
 318         identifier will be changed to v1.0.  (Earlier versions, format
 319         v0.8 and earlier, used the string "LBS Snapshot" instead of
 320         "Cumulus Snapshot", reflecting an earlier name for the project.
 321         Consumers should be prepared for either name.)
 322     Producer: A informative string which identifies the program that
 323         produced the backup.
 324     Date: The date the snapshot was produced, in the local time zone.
 325         This matches the timestamp encoded in the filename, but is
 326         written out in full.  A timezone (offset from UTC) is given.
 327         For example: "2007-08-06 02:22:39 -0700".
 328     Scheme: The <scheme> field from the descriptor filename.
 329     Segments: A whitespace-seprated list of segment names.  Any segment
 330         which is referenced by this snapshot must be included in the
 331         list, since this list can be used in garbage-collecting old
 332         segments, determining which segments need to be downloaded to
 333         completely reconstruct a snapshot, etc.
 334     Root: A single object reference which points to the metadata
 335         listing for the snapshot.
 336     Checksums: A checksum file may be produced (with the same name as
 337         the snapshot descriptor file, but with extension .sha1sums
 338         instead of .lbs) containing SHA-1 checksums of all segments.
 339         This field contains a checksum of that file.
 340     Intent: Informational; records the value of the --intent flag when
 341         the snapshot was created, and can be used when determining which
 342         snapshots to later delete.