doc/format.txt

   1                        Backup Format Description
   2          for Cumulus: Efficient Filesystem Backup to the Cloud
   3                    Version: "Cumulus Snapshot v0.11"
   4
   5 NOTE: This format specification is intended to be mostly stable, but is
   6 still subject to change before the 1.0 release.  The code may provide
   7 additional useful documentation on the format.
   8
   9 NOTE2: The name of this project has changed from LBS to Cumulus.  In
  10 some areas the name "LBS" is still used.
  11
  12 This document simply describes the snapshot format.  It is described
  13 from the point of view of a decompressor which wishes to restore the
  14 files from a snapshot.  It does not specify the exact behavior required
  15 of the backup program writing the snapshot.  For details of the current
  16 backup program, see implementation.txt.
  17
  18 This document does not explain the rationale behind the format; for
  19 that, see design.txt.
  20
  21
  22 DATA CHECKSUMS
  23 ==============
  24
  25 In several places in the Cumulus format, a cryptographic checksum may be
  26 used to allow data integrity to be verified.  At the moment, only the
  27 SHA-1 checksum is supported, but it is expected that other algorithms
  28 will be supported in the future.
  29
  30 When a checksum is called for, the checksum is always stored in a text
  31 format.  The general format used is
  32     <algorithm>=<hexdigits>
  33
  34 <algorithm> identifies the checksum algorithm used, and allows new
  35 algorithms to be added later.  Permissible values are:
  36     "sha1": SHA-1
  37     "sha224": SHA-224 (added in version 0.11)
  38     "sha256": SHA-256 (added in version 0.11)
  39
  40 <hexdigits> is a sequence of hexadecimal digits which encode the
  41 checksum value.  For sha1, <hexdigits> should be precisely 40 digits
  42 long.
  43
  44 A sample checksum string is
  45     sha1=67049e7931ad7db37b5c794d6ad146c82e5f3187
  46
  47
  48 SEGMENTS & OBJECTS: STORAGE AND NAMING
  49 ======================================
  50
  51 A Cumulus snapshot consists, at its base, of a collection of /objects/:
  52 binary blobs of data, much like a file.  Higher layers interpret the
  53 contents of objects in various ways, but the lowest layer is simply
  54 concerned with storing and naming these objects.
  55
  56 An object is a sequence of bytes (octets) of arbitrary length.  An
  57 object may contain as few as zero bytes (though such objects are not
  58 very useful).  Object sizes are potentially unbounded, but it is
  59 recommended that the maximum size of objects produced be on the order of
  60 megabytes.  Files of essentially unlimited size can be stored in a
  61 Cumulus snapshot using objects of modest size, so this should not cause
  62 any real restrictions.
  63
  64 For storage purposes, objects are grouped together into /segments/.
  65 Segments use the TAR format; each object within a segment is stored as a
  66 separate file.  Segments are named using UUIDs (Universally Unique
  67 Identifiers), which are 128-bit numbers.  The textual form of a UUID is
  68 a sequence of lowercase hexadecimal digits with hyphens inserted at
  69 fixed points; an example UUID is
  70     a704eeae-97f2-4f30-91a4-d4473956366b
  71 This segment could be stored in the filesystem as a file
  72     a704eeae-97f2-4f30-91a4-d4473956366b.tar
  73 The UUID used to name a segment is assigned when the segment is created.
  74
  75 Filters can be layered on top of the segment storage to provide
  76 compression, encryption, or other features.  For example, the example
  77 segment above might be stored as
  78     a704eeae-97f2-4f30-91a4-d4473956366b.tar.bz2
  79 or
  80     a704eeae-97f2-4f30-91a4-d4473956366b.tar.gpg
  81 if the file data had been filtered through bzip2 or gpg, respectively,
  82 before storage.  Filtering of segment data is outside the scope of this
  83 format specification, however; it is assumed that if filtering is used,
  84 when decompressing the unfiltered data can be recovered (yielding data
  85 in the TAR format).
  86
  87 Objects within a segment are numbered sequentially.  This sequence
  88 number is then formatted as an 8-digit (zero-padded) hexadecimal
  89 (lowercase) value.  The fully qualified name of an object consists of
  90 the segment name, followed by a slash ("/"), followed by the object
  91 sequence number.  So, for example
  92     a704eeae-97f2-4f30-91a4-d4473956366b/000001ad
  93 names an object.
  94
  95 Within the segment TAR file, the filename used for each object is its
  96 fully-qualified name.  Thus, when extracted using the standard tar
  97 utility, a segment will produce a directory with the same name as the
  98 segment itself, and that directory will contain a set of
  99 sequentially-numbered files each storing the contents of a single
 100 object.
 101
 102 NOTE: When naming an object, the segment portion consists of the UUID
 103 only.  Any extensions appended to the segment when storing it as a file
 104 in the filesystem (for example, .tar.bz2) are _not_ part of the name of
 105 the object.
 106
 107 There are two additional components which may appear in an object name;
 108 both are optional.
 109
 110 First, a checksum may be added to the object name to express an
 111 integrity constraint: the referred-to data must match the checksum
 112 given.  A checksum is enclosed in parentheses and appended to the object
 113 name:
 114     a704eeae-97f2-4f30-91a4-d4473956366b/000001ad(sha1=67049e7931ad7db37b5c794d6ad146c82e5f3187)
 115
 116 Secondly, an object may be /sliced/: a subset of the bytes actually
 117 stored in the object may be selected to be returned.  The slice syntax
 118 is
 119     [<start>+<length>]
 120 where <start> is the first byte to return (as a decimal offset) and
 121 <length> specifies the number of bytes to return (again in decimal).  It
 122 is invalid to select using the slice syntax a range of bytes that does
 123 not fall within the original object.  The slice specification should be
 124 appended to an object name, for example:
 125     a704eeae-97f2-4f30-91a4-d4473956366b/000001ad[264+1000]
 126 selects only bytes 264..1263 from the original object.  As an
 127 abbreviation, the slice syntax
 128     [<length>]
 129 is shorthand for
 130     [0+<length>]
 131 In place of a traditional slice, the annotation
 132     [=<length>]
 133 may be used.  This is somewhat similar to specifying [<length>], but
 134 additionally asserts that the referenced object is exactly <length>
 135 bytes long--that is, this slice syntax does not change the bytes
 136 returned at all, but can be used to provide information about the
 137 underlying object store.
 138
 139 Both a checksum and a slice can be used.  In this case, the checksum is
 140 given first, followed by the slice.  The checksum is computed over the
 141 original object contents, before slicing.
 142
 143 Special Objects
 144 ---------------
 145
 146 In addition to the standard syntax for objects described above, the
 147 special name "zero" may be used instead of segment/sequence number.
 148 This represents an object consisting entirely of zeroes.  The zero
 149 object must have a slice specification appended to indicate the size of
 150 the object.  For example
 151     zero[1024]
 152 represents a block consisting of 1024 null bytes.  A checksum should not
 153 be given.  The slice syntax should use the abbreviated length-only form.
 154
 155
 156 FILE METADATA LISTING
 157 =====================
 158
 159 A snapshot stores two distinct types of data into the object store
 160 described above: data and metadata.  Data for a file may be stored as a
 161 single object, or the data may be broken apart into blocks which are
 162 stored as separate objects.  The file /metadata/ log (which may be
 163 spread across multiple objects) specifies the names of the files in a
 164 snapshot, metadata about them such as ownership and timestamps, and
 165 gives the list of objects that contain the data for the file.
 166
 167 The metadata log consists of a set of stanzas, each of which are
 168 formatted somewhat like RFC 822 (email) headers.  An example is:
 169
 170     name: etc/fstab
 171     checksum: sha1=11bd6ec140e4ec3110a91e1dd0f02b63b701421f
 172     data: 2f46bce9-4554-4a60-a4a2-543637bd3989/000001f7
 173     group: 0 (root)
 174     mode: 0644
 175     mtime: 1177977313
 176     size: 867
 177     type: -
 178     user: 0 (root)
 179
 180 The meanings of all the fields are described later.  A blank line
 181 separates stanzas with information about different files.  In addition
 182 to regular stanzas, the metadata listing may contain a line containing
 183 an object reference prefixed with "@".  Such a line indicates that the
 184 contents of the referenced object should be fetched and parsed as a
 185 metadata listing at this point, prior to continuing to parse the current
 186 object.
 187
 188 Several common encodings are used for various fields.  The encoding used
 189 for each field is specified in the field listing that follows.
 190     encoded string: An arbitrary string (octet sequence), with bytes
 191         optionally escaped by replacing a byte with %xx, where "xx" is a
 192         hexadecimal representation of the byte replaced.  For example,
 193         space can be replaced with "%20".  This is the same escaping
 194         mechanism as used in URLs.
 195     integer: An integer, which may be written in decimal, octal, or
 196         hexadecimal.  Strings starting with 0 are interpreted as octal,
 197         and those starting with 0x are intepreted as hexadecimal.
 198
 199 Common fields (required in all stanzas):
 200     path [encoded string]: Full path of the file archived.  Note: In
 201         previous versions (<= 0.2) the name of this field was "name".
 202     user [special]: The user ID of the file, as an integer, optionally
 203         followed by a space and the corresponding username, as an
 204         escaped string enclosed in parentheses.
 205     group [special]: The group ID which owns the file.  Encoding is the
 206         same as for the user field: an integer, with an optional name in
 207         parentheses following.
 208     mode [integer]: Unix mode bits for the file.
 209     type [special]: A single character which indicates the type of file.
 210         The type indicators are meant to be consistent with the
 211         characters used with the -type option to find(1), and the file
 212         type checks in test(1):
 213             f   regular file
 214             b   block device
 215             c   character device
 216             d   directory
 217             l   symlink
 218             p   pipe
 219             s   socket
 220         Note that previous versions used '-' to indicate a regular file.
 221         This character should not be generated in any new snapshots, but
 222         may be encountered in old snapshots (those with a format version
 223         <= 0.2).
 224     mtime [integer]: Modification time of the file.
 225
 226 Optional common fields:
 227     links [integer]: Number of hard links to this file, generally only
 228         reported if greater than 1.
 229     inode [string]: String specifying the inode number of this file when
 230         it was dumped.  If "links" is greater than 1, then searching for
 231         other files that have an identical "inode" value can be used to
 232         determine which files should be hard-linked together when
 233         restoring.  The inode field should be treated as an opaque
 234         string and compared for equality as such; an implementation may
 235         choose whatever representation is convenient.  The format
 236         produced by the standard tool is <major>/<minor>/<inode> (where
 237         <major> and <minor> specify the device of the containing
 238         filesystem and <inode> is the inode number of the file).
 239     ctime [integer]: Change time for the inode.
 240
 241 Special fields used for regular files:
 242     checksum [string]: Checksum of the file contents.
 243     size [integer]: Size of the file, in bytes.
 244     data [reference list]: Whitespace-separated list of object
 245         references.  The referenced data, when concatenated in the
 246         listed order, will reconstruct the file data.  Any reference
 247         that begins with a "@" character is an indirect reference--the
 248         given object includes a whitespace-separated list of object
 249         references which should be parsed in the same manner as the data
 250         field.
 251
 252 Special fields used for symbolic links:
 253     target[encoded string]: The target of the symlink, as returned by
 254         readlink(2).  Note: In old version of the format (<= 0.2), this
 255         field was called "contents" instead of "target".
 256
 257 Special fields used for block and character device files:
 258     device[special]: The major and minor number of the device.  Encoded
 259         as "major/minor", where major is the major device number encoded
 260         into an integer, and minor is the minor device number.
 261
 262
 263 SNAPSHOT DESCRIPTOR
 264 ===================
 265
 266 The snapshot descriptor is a small file which describes a single
 267 snapshot.  It is one of the few files which is not stored as an object
 268 in the segment store.  It is stored as a separate file, in plain text,
 269 but in the same directory as segments are stored.
 270
 271 The name of snapshot descriptor file is
 272     snapshot-<scheme>-<timestamp>.lbs
 273 <scheme> is a descriptive text which can be used to distinguish several
 274 logically distinct sets of snapshots (such as snapshots for two
 275 different directory trees) that are being stored in the same location.
 276 <timestamp> gives the date and time the snapshot was taken; the format
 277 is %Y%m%dT%H%M%S (20070806T092239 means 2007-08-06 09:22:39).  It is
 278 recommended that the timestamp be given in UTC for consistent sorting
 279 even if the offset from UTC to local time changes, however the
 280 authoritative timestamp (including timezone) can be found in the Date
 281 field.  (In version v0.10 and earlier the timestamp is given in local
 282 time; in current versions UTC is used.)
 283
 284 The contents of the descriptor are a set of RFC 822-style headers (much
 285 like the metadata listing).  The fields which are defined are:
 286     Format: The string "Cumulus Snapshot v0.11" which identifies this
 287         file as a Cumulus backup descriptor.  The version number (v0.11)
 288         might change if there are changes to the format.  It is expected
 289         that at some point, once the format is stabilized, the version
 290         identifier will be changed to v1.0.  (Earlier versions, format
 291         v0.8 and earlier, used the string "LBS Snapshot" instead of
 292         "Cumulus Snapshot", reflecting an earlier name for the project.
 293         Consumers should be prepared for either name.)
 294     Producer: A informative string which identifies the program that
 295         produced the backup.
 296     Date: The date the snapshot was produced, in the local time zone.
 297         This matches the timestamp encoded in the filename, but is
 298         written out in full.  A timezone (offset from UTC) is given.
 299         For example: "2007-08-06 02:22:39 -0700".
 300     Scheme: The <scheme> field from the descriptor filename.
 301     Segments: A whitespace-seprated list of segment names.  Any segment
 302         which is referenced by this snapshot must be included in the
 303         list, since this list can be used in garbage-collecting old
 304         segments, determining which segments need to be downloaded to
 305         completely reconstruct a snapshot, etc.
 306     Root: A single object reference which points to the metadata
 307         listing for the snapshot.
 308     Checksums: A checksum file may be produced (with the same name as
 309         the snapshot descriptor file, but with extension .sha1sums
 310         instead of .lbs) containing SHA-1 checksums of all segments.
 311         This field contains a checksum of that file.
 312     Intent: Informational; records the value of the --intent flag when
 313         the snapshot was created, and can be used when determining which
 314         snapshots to later delete.