doc/format.txt

   1                        Backup Format Description
   2          for Cumulus: Efficient Filesystem Backup to the Cloud
   3                       Version: "LBS Snapshot v0.8"
   4
   5 NOTE: This format specification is intended to be mostly stable, but is
   6 still subject to change before the 1.0 release.  The code may provide
   7 additional useful documentation on the format.
   8
   9 NOTE2: The name of this project has changed from LBS to Cumulus.
  10 However, to avoid introducing gratuitous changes into the format, in
  11 most cases any references to "LBS" in the format description have been
  12 left as-is.  The name may be changed in the future if the format is
  13 updated.
  14
  15 This document simply describes the snapshot format.  It is described
  16 from the point of view of a decompressor which wishes to restore the
  17 files from a snapshot.  It does not specify the exact behavior required
  18 of the backup program writing the snapshot.  For details of the current
  19 backup program, see implementation.txt.
  20
  21 This document does not explain the rationale behind the format; for
  22 that, see design.txt.
  23
  24
  25 DATA CHECKSUMS
  26 ==============
  27
  28 In several places in the Cumulus format, a cryptographic checksum may be
  29 used to allow data integrity to be verified.  At the moment, only the
  30 SHA-1 checksum is supported, but it is expected that other algorithms
  31 will be supported in the future.
  32
  33 When a checksum is called for, the checksum is always stored in a text
  34 format.  The general format used is
  35     <algorithm>=<hexdigits>
  36
  37 <algorithm> identifies the checksum algorithm used, and allows new
  38 algorithms to be added later.  At the moment, the only permissible value
  39 is "sha1", indicating a SHA-1 checksum.
  40
  41 <hexdigits> is a sequence of hexadecimal digits which encode the
  42 checksum value.  For sha1, <hexdigits> should be precisely 40 digits
  43 long.
  44
  45 A sample checksum string is
  46     sha1=67049e7931ad7db37b5c794d6ad146c82e5f3187
  47
  48
  49 SEGMENTS & OBJECTS: STORAGE AND NAMING
  50 ======================================
  51
  52 A Cumulus snapshot consists, at its base, of a collection of /objects/:
  53 binary blobs of data, much like a file.  Higher layers interpret the
  54 contents of objects in various ways, but the lowest layer is simply
  55 concerned with storing and naming these objects.
  56
  57 An object is a sequence of bytes (octets) of arbitrary length.  An
  58 object may contain as few as zero bytes (though such objects are not
  59 very useful).  Object sizes are potentially unbounded, but it is
  60 recommended that the maximum size of objects produced be on the order of
  61 megabytes.  Files of essentially unlimited size can be stored in a
  62 Cumulus snapshot using objects of modest size, so this should not cause
  63 any real restrictions.
  64
  65 For storage purposes, objects are grouped together into /segments/.
  66 Segments use the TAR format; each object within a segment is stored as a
  67 separate file.  Segments are named using UUIDs (Universally Unique
  68 Identifiers), which are 128-bit numbers.  The textual form of a UUID is
  69 a sequence of lowercase hexadecimal digits with hyphens inserted at
  70 fixed points; an example UUID is
  71     a704eeae-97f2-4f30-91a4-d4473956366b
  72 This segment could be stored in the filesystem as a file
  73     a704eeae-97f2-4f30-91a4-d4473956366b.tar
  74 The UUID used to name a segment is assigned when the segment is created.
  75
  76 Filters can be layered on top of the segment storage to provide
  77 compression, encryption, or other features.  For example, the example
  78 segment above might be stored as
  79     a704eeae-97f2-4f30-91a4-d4473956366b.tar.bz2
  80 or
  81     a704eeae-97f2-4f30-91a4-d4473956366b.tar.gpg
  82 if the file data had been filtered through bzip2 or gpg, respectively,
  83 before storage.  Filtering of segment data is outside the scope of this
  84 format specification, however; it is assumed that if filtering is used,
  85 when decompressing the unfiltered data can be recovered (yielding data
  86 in the TAR format).
  87
  88 Objects within a segment are numbered sequentially.  This sequence
  89 number is then formatted as an 8-digit (zero-padded) hexadecimal
  90 (lowercase) value.  The fully qualified name of an object consists of
  91 the segment name, followed by a slash ("/"), followed by the object
  92 sequence number.  So, for example
  93     a704eeae-97f2-4f30-91a4-d4473956366b/000001ad
  94 names an object.
  95
  96 Within the segment TAR file, the filename used for each object is its
  97 fully-qualified name.  Thus, when extracted using the standard tar
  98 utility, a segment will produce a directory with the same name as the
  99 segment itself, and that directory will contain a set of
 100 sequentially-numbered files each storing the contents of a single
 101 object.
 102
 103 NOTE: When naming an object, the segment portion consists of the UUID
 104 only.  Any extensions appended to the segment when storing it as a file
 105 in the filesystem (for example, .tar.bz2) are _not_ part of the name of
 106 the object.
 107
 108 There are two additional components which may appear in an object name;
 109 both are optional.
 110
 111 First, a checksum may be added to the object name to express an
 112 integrity constraint: the referred-to data must match the checksum
 113 given.  A checksum is enclosed in parentheses and appended to the object
 114 name:
 115     a704eeae-97f2-4f30-91a4-d4473956366b/000001ad(sha1=67049e7931ad7db37b5c794d6ad146c82e5f3187)
 116
 117 Secondly, an object may be /sliced/: a subset of the bytes actually
 118 stored in the object may be selected to be returned.  The slice syntax
 119 is
 120     [<start>+<length>]
 121 where <start> is the first byte to return (as a decimal offset) and
 122 <length> specifies the number of bytes to return (again in decimal).  It
 123 is invalid to select using the slice syntax a range of bytes that does
 124 not fall within the original object.  The slice specification should be
 125 appended to an object name, for example:
 126     a704eeae-97f2-4f30-91a4-d4473956366b/000001ad[264+1000]
 127 selects only bytes 264..1263 from the original object.  As an
 128 abbreviation, the slice syntax
 129     [<length>]
 130 is shorthand for
 131     [0+<length>]
 132 In place of a traditional slice, the annotation
 133     [=<length>]
 134 may be used.  This is somewhat similar to specifying [<length>], but
 135 additionally asserts that the referenced object is exactly <length>
 136 bytes long--that is, this slice syntax does not change the bytes
 137 returned at all, but can be used to provide information about the
 138 underlying object store.
 139
 140 Both a checksum and a slice can be used.  In this case, the checksum is
 141 given first, followed by the slice.  The checksum is computed over the
 142 original object contents, before slicing.
 143
 144 Special Objects
 145 ---------------
 146
 147 In addition to the standard syntax for objects described above, the
 148 special name "zero" may be used instead of segment/sequence number.
 149 This represents an object consisting entirely of zeroes.  The zero
 150 object must have a slice specification appended to indicate the size of
 151 the object.  For example
 152     zero[1024]
 153 represents a block consisting of 1024 null bytes.  A checksum should not
 154 be given.  The slice syntax should use the abbreviated length-only form.
 155
 156
 157 FILE METADATA LISTING
 158 =====================
 159
 160 A snapshot stores two distinct types of data into the object store
 161 described above: data and metadata.  Data for a file may be stored as a
 162 single object, or the data may be broken apart into blocks which are
 163 stored as separate objects.  The file /metadata/ log (which may be
 164 spread across multiple objects) specifies the names of the files in a
 165 snapshot, metadata about them such as ownership and timestamps, and
 166 gives the list of objects that contain the data for the file.
 167
 168 The metadata log consists of a set of stanzas, each of which are
 169 formatted somewhat like RFC 822 (email) headers.  An example is:
 170
 171     name: etc/fstab
 172     checksum: sha1=11bd6ec140e4ec3110a91e1dd0f02b63b701421f
 173     data: 2f46bce9-4554-4a60-a4a2-543637bd3989/000001f7
 174     group: 0 (root)
 175     mode: 0644
 176     mtime: 1177977313
 177     size: 867
 178     type: -
 179     user: 0 (root)
 180
 181 The meanings of all the fields are described later.  A blank line
 182 separates stanzas with information about different files.  In addition
 183 to regular stanzas, the metadata listing may contain a line containing
 184 an object reference prefixed with "@".  Such a line indicates that the
 185 contents of the referenced object should be fetched and parsed as a
 186 metadata listing at this point, prior to continuing to parse the current
 187 object.
 188
 189 Several common encodings are used for various fields.  The encoding used
 190 for each field is specified in the field listing that follows.
 191     encoded string: An arbitrary string (octet sequence), with bytes
 192         optionally escaped by replacing a byte with %xx, where "xx" is a
 193         hexadecimal representation of the byte replaced.  For example,
 194         space can be replaced with "%20".  This is the same escaping
 195         mechanism as used in URLs.
 196     integer: An integer, which may be written in decimal, octal, or
 197         hexadecimal.  Strings starting with 0 are interpreted as octal,
 198         and those starting with 0x are intepreted as hexadecimal.
 199
 200 Common fields (required in all stanzas):
 201     path [encoded string]: Full path of the file archived.  Note: In
 202         previous versions (<= 0.2) the name of this field was "name".
 203     user [special]: The user ID of the file, as an integer, optionally
 204         followed by a space and the corresponding username, as an
 205         escaped string enclosed in parentheses.
 206     group [special]: The group ID which owns the file.  Encoding is the
 207         same as for the user field: an integer, with an optional name in
 208         parentheses following.
 209     mode [integer]: Unix mode bits for the file.
 210     type [special]: A single character which indicates the type of file.
 211         The type indicators are meant to be consistent with the
 212         characters used with the -type option to find(1), and the file
 213         type checks in test(1):
 214             f   regular file
 215             b   block device
 216             c   character device
 217             d   directory
 218             l   symlink
 219             p   pipe
 220             s   socket
 221         Note that previous versions used '-' to indicate a regular file.
 222         This character should not be generated in any new snapshots, but
 223         may be encountered in old snapshots (those with a format version
 224         <= 0.2).
 225     mtime [integer]: Modification time of the file.
 226
 227 Optional common fields:
 228     links [integer]: Number of hard links to this file, generally only
 229         reported if greater than 1.
 230     inode [string]: String specifying the inode number of this file when
 231         it was dumped.  If "links" is greater than 1, then searching for
 232         other files that have an identical "inode" value can be used to
 233         determine which files should be hard-linked together when
 234         restoring.  The inode field should be treated as an opaque
 235         string and compared for equality as such; an implementation may
 236         choose whatever representation is convenient.  The format
 237         produced by the standard tool is <major>/<minor>/<inode> (where
 238         <major> and <minor> specify the device of the containing
 239         filesystem and <inode> is the inode number of the file).
 240     ctime [integer]: Change time for the inode.
 241
 242 Special fields used for regular files:
 243     checksum [string]: Checksum of the file contents.
 244     size [integer]: Size of the file, in bytes.
 245     data [reference list]: Whitespace-separated list of object
 246         references.  The referenced data, when concatenated in the
 247         listed order, will reconstruct the file data.  Any reference
 248         that begins with a "@" character is an indirect reference--the
 249         given object includes a whitespace-separated list of object
 250         references which should be parsed in the same manner as the data
 251         field.
 252
 253 Special fields used for symbolic links:
 254     target[encoded string]: The target of the symlink, as returned by
 255         readlink(2).  Note: In old version of the format (<= 0.2), this
 256         field was called "contents" instead of "target".
 257
 258 Special fields used for block and character device files:
 259     device[special]: The major and minor number of the device.  Encoded
 260         as "major/minor", where major is the major device number encoded
 261         into an integer, and minor is the minor device number.
 262
 263
 264 SNAPSHOT DESCRIPTOR
 265 ===================
 266
 267 The snapshot descriptor is a small file which describes a single
 268 snapshot.  It is one of the few files which is not stored as an object
 269 in the segment store.  It is stored as a separate file, in plain text,
 270 but in the same directory as segments are stored.
 271
 272 The name of snapshot descriptor file is
 273     snapshot-<scheme>-<timestamp>.lbs
 274 <scheme> is a descriptive text which can be used to distinguish several
 275 logically distinct sets of snapshots (such as snapshots for two
 276 different directory trees) that are being stored in the same location.
 277 <timestamp> gives the date and time the snapshot was taken; the format
 278 is %Y%m%dT%H%M%S (20070806T092239 means 2007-08-06 09:22:39).
 279
 280 The contents of the descriptor are a set of RFC 822-style headers (much
 281 like the metadata listing).  The fields which are defined are:
 282     Format: The string "LBS Snapshot v0.6" which identifies this file as
 283         a Cumulus backup descriptor.  The version number (v0.6) might
 284         change if there are changes to the format.  It is expected that
 285         at some point, once the format is stabilized, the version
 286         identifier will be changed to v1.0.
 287     Producer: A informative string which identifies the program that
 288         produced the backup.
 289     Date: The date the snapshot was produced.  This matches the
 290         timestamp encoded in the filename, but is written out in full.
 291         A timezone is given.  For example: "2007-08-06 09:22:39 -0700".
 292     Scheme: The <scheme> field from the descriptor filename.
 293     Segments: A whitespace-seprated list of segment names.  Any segment
 294         which is referenced by this snapshot must be included in the
 295         list, since this list can be used in garbage-collecting old
 296         segments, determining which segments need to be downloaded to
 297         completely reconstruct a snapshot, etc.
 298     Root: A single object reference which points to the metadata
 299         listing for the snapshot.
 300     Checksums: A checksum file may be produced (with the same name as
 301         the snapshot descriptor file, but with extension .sha1sums
 302         instead of .lbs) containing SHA-1 checksums of all segments.
 303         This field contains a checksum of that file.
 304     Intent: Informational; records the value of the --intent flag when
 305         the snapshot was created, and can be used when determining which
 306         snapshots to later delete.