From: Michael Vrable <mvrable@cs.ucsd.edu>
Date: Mon, 6 Aug 2007 21:02:23 +0000 (-0700)
Subject: Update the format documentation to describe the current backup format.
X-Git-Url: http://git.vrable.net/?a=commitdiff_plain;h=6c532fe12d50ae9b1480ba93e668c042bf36eb63;p=cumulus.git

Update the format documentation to describe the current backup format.

The old documentation referred to the old binary backup format, and was
incomplete at that.  Rewrite it to discuss the current format, including a
discussion of segment/object storage, object references, the format of the
metadata listing, and the root backup descriptor.

The documentation can be improved, and some parts are certainly a bit
spotty, but this gives a good quick overview of the entire format.
---

diff --git a/format.txt b/format.txt
index 1e4ce24..590116d 100644
--- a/format.txt
+++ b/format.txt
@@ -1,50 +1,236 @@
                        Backup Format Description
                   for an LFS-Inspired Backup Solution
 
-NOTE: This is simply a proposal at this point in time, and not yet
-implemented.  Details are subject to change.
-
-========================================================================
-
-Goals: To provide a stable and extensible data storage format for
-efficient remote filesystem backups.  Among the features desired in the
-format are:
-  - Support for grouping unchanging file contents together, and reusing
-    it for future backups.
-  - Nonetheless allow old backups to be deleted (at least those parts
-    that are not also used by newer backups).
-  - Support some form of rdiff-style incremental differences within a
-    file.
-The current plan is to implement compression and encryption separately:
-not as part of the base format, but simply by passing the backup data
-through filters such as bzip2 or gpg.
-
-Data is organized into a collection of _objects_, which are grouped
-together for storage purposes into _segments_.  Objects may refer to
-other objects; a snapshot consists of a tree object which in turn refers
-to other objects containing file data.  A new snapshot may be created
-which refers to some of the old objects with file data, if those files
-have not changed.
-
-========================================================================
-
-Object naming:
-  - Each segment is assigned a unique 128-bit identifier (uuid).  Each
-    segment is stored as a separate file whose name is based on its
-    uuid.
-  - Objects within a segment are numbered, using a 32-bit counter.
-
-Each segment is structured as a TAR file (optionally filtered through a
-compressor such as gzip/bzip2, or encrypted).  Objects are stored as
-individual files.
-
-File attributes: Metadata for each file is stored in a dictionary.
-Dictionary keys include:
-    type: uint8_t ('p', 's', 'c', 'b', 'l', 'd', '-')
-    mode: uint16_t
-    user: uint32_t
-    group: uint32_t
-    size: int64_t
-    atime: int64_t
-    mtime: int64_t
-    ctime: int64_t
+NOTE: This format specification is not yet complete.  Right now the code
+provides the best documentation of the format.
+
+This document simply describes the snapshot format.  It is described
+from the point of view of a decompressor which wishes to restore the
+files from a snapshot.  It does not specify the exact behavior required
+of the backup program writing the snapshot.
+
+This document does not explain the rationale behind the format; for
+that, see design.txt.
+
+
+DATA CHECKSUMS
+==============
+
+In several places in the LBS format, a cryptographic checksum may be
+used to allow data integrity to be verified.  At the moment, only the
+SHA-1 checksum is supported, but it is expected that other algorithms
+will be supported in the future.
+
+When a checksum is called for, the checksum is always stored in a text
+format.  The general format used is
+    <algorithm>=<hexdigits>
+
+<algorithm> identifies the checksum algorithm used, and allows new
+algorithms to be added later.  At the moment, the only permissible value
+is "sha1", indicating a SHA-1 checksum.
+
+<hexdigits> is a sequence of hexadecimal digits which encode the
+checksum value.  For sha1, <hexdigits> should be precisely 40 digits
+long.
+
+A sample checksum string is
+    sha1=67049e7931ad7db37b5c794d6ad146c82e5f3187
+
+
+SEGMENTS & OBJECTS: STORAGE AND NAMING
+======================================
+
+An LBS snapshot consists, at its base, of a collection of /objects/:
+binary blobs of data, much like a file.  Higher layers interpret the
+contents of objects in various ways, but the lowest layer is simply
+concerned with storing and naming these objects.
+
+An object is a sequence of bytes (octets) of arbitrary length.  An
+object may contain as few as zero bytes (though such objects are not
+very useful).  Object sizes are potentially unbounded, but it is
+recommended that the maximum size of objects produced be on the order of
+megabytes.  Files of essentially unlimited size can be stored in an LBS
+snapshot using objects of modest size, so this should not cause any real
+restrictions.
+
+For storage purposes, objects are grouped together into /segments/.
+Segments use the TAR format; each object within a segment is stored as a
+separate file.  Segments are named using UUIDs (Universally Unique
+Identifiers), which are 128-bit numbers.  The textual form of a UUID is
+a sequence of lowercase hexadecimal digits with hyphens inserted at
+fixed points; an example UUID is
+    a704eeae-97f2-4f30-91a4-d4473956366b
+This segment could be stored in the filesystem as a file
+    a704eeae-97f2-4f30-91a4-d4473956366b.tar
+The UUID used to name a segment is assigned when the segment is created.
+
+Filters can be layered on top of the segment storage to provide
+compression, encryption, or other features.  For example, the example
+segment above might be stored as
+    a704eeae-97f2-4f30-91a4-d4473956366b.tar.bz2
+or
+    a704eeae-97f2-4f30-91a4-d4473956366b.tar.gpg
+if the file data had been filtered through bzip2 or gpg, respectively,
+before storage.  Filtering of segment data is outside the scope of this
+format specification, however; it is assumed that if filtering is used,
+when decompressing the unfiltered data can be recovered (yielding data
+in the TAR format).
+
+Objects within a segment are numbered sequentially.  This sequence
+number is then formatted as an 8-digit (zero-padded) hexadecimal
+(lowercase) value.  The fully qualified name of an object consists of
+the segment name, followed by a slash ("/"), followed by the object
+sequence number.  So, for example
+    a704eeae-97f2-4f30-91a4-d4473956366b/000001ad
+names an object.
+
+Within the segment TAR file, the filename used for each object is its
+fully-qualified name.  Thus, when extracted using the standard tar
+utility, a segment will produce a directory with the same name as the
+segment itself, and that directory will contain a set of
+sequentially-numbered files each storing the contents of a single
+object.
+
+NOTE: When naming an object, the segment portion consists of the UUID
+only.  Any extensions appended to the segment when storing it as a file
+in the filesystem (for example, .tar.bz2) are _not_ part of the name of
+the object.
+
+There are two additional components which may appear in an object name;
+both are optional.
+
+First, a checksum may be added to the object name to express an
+integrity constraint: the referred-to data must match the checksum
+given.  A checksum is enclosed in parentheses and appended to the object
+name:
+    a704eeae-97f2-4f30-91a4-d4473956366b/000001ad(sha1=67049e7931ad7db37b5c794d6ad146c82e5f3187)
+
+Secondly, an object may be /sliced/: a subset of the bytes actually
+stored in the object may be selected to be returned.  The slice syntax
+is
+    [<start>+<length>]
+where <start> is the first byte to return (as a decimal offset) and
+<length> specifies the number of bytes to return (again in decimal).  It
+is invalid to select using the slice syntax a range of bytes that does
+not fall within the original object.  The slice specification should be
+appended to an object name, for example:
+    a704eeae-97f2-4f30-91a4-d4473956366b/000001ad[264+1000]
+selects only bytes 264..1263 from the original object.
+
+Both a checksum and a slice can be used.  In this case, the checksum is
+given first, followed by the slice.  The checksum is computed over the
+original object contents, before slicing.
+
+
+FILE METADATA LISTING
+=====================
+
+A snapshot stores two distinct types of data into the object store
+described above: data and metadata.  Data for a file may be stored as a
+single object, or the data may be broken apart into blocks which are
+stored as separate objects.  The file /metadata/ log (which may be
+spread across multiple objects) specifies the names of the files in a
+snapshot, metadata about them such as ownership and timestamps, and
+gives the list of objects that contain the data for the file.
+
+The metadata log consists of a set of stanzas, each of which are
+formatted somewhat like RFC 822 (email) headers.  An example is:
+
+    name: etc/fstab
+    checksum: sha1=11bd6ec140e4ec3110a91e1dd0f02b63b701421f
+    data: 2f46bce9-4554-4a60-a4a2-543637bd3989/000001f7
+    group: 0 (root)
+    mode: 0644
+    mtime: 1177977313
+    size: 867
+    type: -
+    user: 0 (root)
+
+The meanings of all the fields are described later.  A blank line
+separates stanzas with information about different files.  In addition
+to regular stanzas, the metadata listing may contain a line containing
+an object reference prefixed with "@".  Such a line indicates that the
+contents of the referenced object should be fetched and parsed as a
+metadata listing at this point, prior to continuing to parse the current
+object.
+
+Several common encodings are used for various fields.  The encoding used
+for each field is specified in the field listing that follows.
+    encoded string: An arbitrary string (octet sequence), with bytes
+        optionally escaped by replacing a byte with %xx, where "xx" is a
+        hexadecimal representation of the byte replaced.  For example,
+        space can be replaced with "%20".  This is the same escaping
+        mechanism as used in URLs.
+    integer: An integer, which may be written in decimal, octal, or
+        hexadecimal.  Strings starting with 0 are interpreted as octal,
+        and those starting with 0x are intepreted as hexadecimal.
+
+Common fields (required in all stanzas):
+    name [encoded string]: Full path of the file archived.
+    user [special]: The user ID of the file, as an integer, optionally
+        followed by a space and the corresponding username, as an
+        escaped string enclosed in parentheses.
+    group [special]: The group ID which owns the file.  Encoding is the
+        same as for the user field: an integer, with an optional name in
+        parentheses following.
+    mode [integer]: Unix mode bits for the file.
+    type [special]: A single character which indicates the type of file.
+        The type indicators are meant to be consistent with the
+        characters used to indicate file type in a directory listing:
+            -   regular file
+            b   block device
+            c   character device
+            d   directory
+            l   symlink
+            p   pipe
+            s   socket
+    mtime [integer]: Modification time of the file.
+
+Special fields used for regular files:
+    checksum [string]: Checksum of the file contents.
+    size [integer]: Size of the file, in bytes.
+    data [reference list]: Whitespace-separated list of object
+        references.  The referenced data, when concatenated in the
+        listed order, will reconstruct the file data.  Any reference
+        that begins with a "@" character is an indirect reference--the
+        given object includes a whitespace-separated list of object
+        references which should be parsed in the same manner as the data
+        field.
+
+
+SNAPSHOT DESCRIPTOR
+===================
+
+The snapshot descriptor is a small file which describes a single
+snapshot.  It is one of the few files which is not stored as an object
+in the segment store.  It is stored as a separate file, in plain text,
+but in the same directory as segments are stored.
+
+The name of snapshot descriptor file is
+    snapshot-<scheme>-<timestamp>.lbs
+<scheme> is a descriptive text which can be used to distinguish several
+logically distinct sets of snapshots (such as snapshots for two
+different directory trees) that are being stored in the same location.
+<timestamp> gives the date and time the snapshot was taken; the format
+is %Y%m%dT%H%M%S (20070806T092239 means 2007-08-06 09:22:39).
+
+The contents of the descriptor are a set of RFC 822-style headers (much
+like the metadata listing).  The fields which are defined are:
+    Format: The string "LBS Snapshot v0.2" which identifies this file as
+        an LBS backup descriptor.  The version number (v0.2) might
+        change if there are changes to the format.  It is expected that
+        at some point, once the format is stabilized, the version
+        identifier will be changed to v1.0.
+    Producer: A informative string which identifies the program that
+        produced the backup.
+    Date: The date the snapshot was produced.  This matches the
+        timestamp encoded in the filename, but is written out in full.
+        A timezone is given.  For example: "2007-08-06 09:22:39 -0700".
+    Scheme: The <scheme> field from the descriptor filename.
+    Segments: A whitespace-seprated list of segment names.  Any segment
+        which is referenced by this snapshot must be included in the
+        list, since this list can be used in garbage-collecting old
+        segments, determining which segments need to be downloaded to
+        completely reconstruct a snapshot, etc.
+    Root: A single object reference which points to the metadata
+        listing for the snapshot.