X-Git-Url: http://git.vrable.net/?a=blobdiff_plain;f=format.txt;h=54d9f8cd6020d9900b77c715bd728e6ddaf27ad2;hb=893aa36d4dee18cc85843c441219c8e177282e79;hp=f7ce16660bd4e758cb39772880f3039f3120bdaf;hpb=c93e446e6d0863dc8105884f38f7e33025dd7406;p=cumulus.git diff --git a/format.txt b/format.txt index f7ce166..54d9f8c 100644 --- a/format.txt +++ b/format.txt @@ -1,27 +1,250 @@ Backup Format Description for an LFS-Inspired Backup Solution -NOTE: This is simply a proposal at this point in time, and not yet -implemented. Details are subject to change. - -======================================================================== - -Goals: To provide a stable and extensible data storage format for -efficient remote filesystem backups. Among the features desired in the -format are: - - Support for grouping unchanging file contents together, and reusing - it for future backups. - - Nonetheless allow old backups to be deleted (at least those parts - that are not also used by newer backups). - - Support some form of rdiff-style incremental differences within a - file. -The current plan is to implement compression and encryption separately: -not as part of the base format, but simply by passing the backup data -through filters such as bzip2 or gpg. - -Data is organized into a collection of _objects_, which are grouped -together for storage purposes into _segments_. Objects may refer to -other objects; a snapshot consists of a tree object which in turn refers -to other objects containing file data. A new snapshot may be created -which refers to some of the old objects with file data, if those files -have not changed. +NOTE: This format specification is not yet complete. Right now the code +provides the best documentation of the format. + +This document simply describes the snapshot format. It is described +from the point of view of a decompressor which wishes to restore the +files from a snapshot. It does not specify the exact behavior required +of the backup program writing the snapshot. + +This document does not explain the rationale behind the format; for +that, see design.txt. + + +DATA CHECKSUMS +============== + +In several places in the LBS format, a cryptographic checksum may be +used to allow data integrity to be verified. At the moment, only the +SHA-1 checksum is supported, but it is expected that other algorithms +will be supported in the future. + +When a checksum is called for, the checksum is always stored in a text +format. The general format used is + = + + identifies the checksum algorithm used, and allows new +algorithms to be added later. At the moment, the only permissible value +is "sha1", indicating a SHA-1 checksum. + + is a sequence of hexadecimal digits which encode the +checksum value. For sha1, should be precisely 40 digits +long. + +A sample checksum string is + sha1=67049e7931ad7db37b5c794d6ad146c82e5f3187 + + +SEGMENTS & OBJECTS: STORAGE AND NAMING +====================================== + +An LBS snapshot consists, at its base, of a collection of /objects/: +binary blobs of data, much like a file. Higher layers interpret the +contents of objects in various ways, but the lowest layer is simply +concerned with storing and naming these objects. + +An object is a sequence of bytes (octets) of arbitrary length. An +object may contain as few as zero bytes (though such objects are not +very useful). Object sizes are potentially unbounded, but it is +recommended that the maximum size of objects produced be on the order of +megabytes. Files of essentially unlimited size can be stored in an LBS +snapshot using objects of modest size, so this should not cause any real +restrictions. + +For storage purposes, objects are grouped together into /segments/. +Segments use the TAR format; each object within a segment is stored as a +separate file. Segments are named using UUIDs (Universally Unique +Identifiers), which are 128-bit numbers. The textual form of a UUID is +a sequence of lowercase hexadecimal digits with hyphens inserted at +fixed points; an example UUID is + a704eeae-97f2-4f30-91a4-d4473956366b +This segment could be stored in the filesystem as a file + a704eeae-97f2-4f30-91a4-d4473956366b.tar +The UUID used to name a segment is assigned when the segment is created. + +Filters can be layered on top of the segment storage to provide +compression, encryption, or other features. For example, the example +segment above might be stored as + a704eeae-97f2-4f30-91a4-d4473956366b.tar.bz2 +or + a704eeae-97f2-4f30-91a4-d4473956366b.tar.gpg +if the file data had been filtered through bzip2 or gpg, respectively, +before storage. Filtering of segment data is outside the scope of this +format specification, however; it is assumed that if filtering is used, +when decompressing the unfiltered data can be recovered (yielding data +in the TAR format). + +Objects within a segment are numbered sequentially. This sequence +number is then formatted as an 8-digit (zero-padded) hexadecimal +(lowercase) value. The fully qualified name of an object consists of +the segment name, followed by a slash ("/"), followed by the object +sequence number. So, for example + a704eeae-97f2-4f30-91a4-d4473956366b/000001ad +names an object. + +Within the segment TAR file, the filename used for each object is its +fully-qualified name. Thus, when extracted using the standard tar +utility, a segment will produce a directory with the same name as the +segment itself, and that directory will contain a set of +sequentially-numbered files each storing the contents of a single +object. + +NOTE: When naming an object, the segment portion consists of the UUID +only. Any extensions appended to the segment when storing it as a file +in the filesystem (for example, .tar.bz2) are _not_ part of the name of +the object. + +There are two additional components which may appear in an object name; +both are optional. + +First, a checksum may be added to the object name to express an +integrity constraint: the referred-to data must match the checksum +given. A checksum is enclosed in parentheses and appended to the object +name: + a704eeae-97f2-4f30-91a4-d4473956366b/000001ad(sha1=67049e7931ad7db37b5c794d6ad146c82e5f3187) + +Secondly, an object may be /sliced/: a subset of the bytes actually +stored in the object may be selected to be returned. The slice syntax +is + [+] +where is the first byte to return (as a decimal offset) and + specifies the number of bytes to return (again in decimal). It +is invalid to select using the slice syntax a range of bytes that does +not fall within the original object. The slice specification should be +appended to an object name, for example: + a704eeae-97f2-4f30-91a4-d4473956366b/000001ad[264+1000] +selects only bytes 264..1263 from the original object. + +Both a checksum and a slice can be used. In this case, the checksum is +given first, followed by the slice. The checksum is computed over the +original object contents, before slicing. + + +FILE METADATA LISTING +===================== + +A snapshot stores two distinct types of data into the object store +described above: data and metadata. Data for a file may be stored as a +single object, or the data may be broken apart into blocks which are +stored as separate objects. The file /metadata/ log (which may be +spread across multiple objects) specifies the names of the files in a +snapshot, metadata about them such as ownership and timestamps, and +gives the list of objects that contain the data for the file. + +The metadata log consists of a set of stanzas, each of which are +formatted somewhat like RFC 822 (email) headers. An example is: + + name: etc/fstab + checksum: sha1=11bd6ec140e4ec3110a91e1dd0f02b63b701421f + data: 2f46bce9-4554-4a60-a4a2-543637bd3989/000001f7 + group: 0 (root) + mode: 0644 + mtime: 1177977313 + size: 867 + type: - + user: 0 (root) + +The meanings of all the fields are described later. A blank line +separates stanzas with information about different files. In addition +to regular stanzas, the metadata listing may contain a line containing +an object reference prefixed with "@". Such a line indicates that the +contents of the referenced object should be fetched and parsed as a +metadata listing at this point, prior to continuing to parse the current +object. + +Several common encodings are used for various fields. The encoding used +for each field is specified in the field listing that follows. + encoded string: An arbitrary string (octet sequence), with bytes + optionally escaped by replacing a byte with %xx, where "xx" is a + hexadecimal representation of the byte replaced. For example, + space can be replaced with "%20". This is the same escaping + mechanism as used in URLs. + integer: An integer, which may be written in decimal, octal, or + hexadecimal. Strings starting with 0 are interpreted as octal, + and those starting with 0x are intepreted as hexadecimal. + +Common fields (required in all stanzas): + name [encoded string]: Full path of the file archived. + user [special]: The user ID of the file, as an integer, optionally + followed by a space and the corresponding username, as an + escaped string enclosed in parentheses. + group [special]: The group ID which owns the file. Encoding is the + same as for the user field: an integer, with an optional name in + parentheses following. + mode [integer]: Unix mode bits for the file. + type [special]: A single character which indicates the type of file. + The type indicators are meant to be consistent with the + characters used to indicate file type in a directory listing: + - regular file + b block device + c character device + d directory + l symlink + p pipe + s socket + mtime [integer]: Modification time of the file. + +Optional common fields: + links [integer]: Number of hard links to this file, generally only + reported if greater than 1. + inode [string]: String specifying the inode number of this file when + it was dumped. If "links" is greater than 1, then searching for + other files that have an identical "inode" value can be used to + determine which files should be hard-linked together when + restoring. The inode field should be treated as an opaque + string and compared for equality as such; an implementation may + choose whatever representation is convenient. The format + produced by the standard tool is // (where + and specify the device of the containing + filesystem and is the inode number of the file). + +Special fields used for regular files: + checksum [string]: Checksum of the file contents. + size [integer]: Size of the file, in bytes. + data [reference list]: Whitespace-separated list of object + references. The referenced data, when concatenated in the + listed order, will reconstruct the file data. Any reference + that begins with a "@" character is an indirect reference--the + given object includes a whitespace-separated list of object + references which should be parsed in the same manner as the data + field. + + +SNAPSHOT DESCRIPTOR +=================== + +The snapshot descriptor is a small file which describes a single +snapshot. It is one of the few files which is not stored as an object +in the segment store. It is stored as a separate file, in plain text, +but in the same directory as segments are stored. + +The name of snapshot descriptor file is + snapshot--.lbs + is a descriptive text which can be used to distinguish several +logically distinct sets of snapshots (such as snapshots for two +different directory trees) that are being stored in the same location. + gives the date and time the snapshot was taken; the format +is %Y%m%dT%H%M%S (20070806T092239 means 2007-08-06 09:22:39). + +The contents of the descriptor are a set of RFC 822-style headers (much +like the metadata listing). The fields which are defined are: + Format: The string "LBS Snapshot v0.2" which identifies this file as + an LBS backup descriptor. The version number (v0.2) might + change if there are changes to the format. It is expected that + at some point, once the format is stabilized, the version + identifier will be changed to v1.0. + Producer: A informative string which identifies the program that + produced the backup. + Date: The date the snapshot was produced. This matches the + timestamp encoded in the filename, but is written out in full. + A timezone is given. For example: "2007-08-06 09:22:39 -0700". + Scheme: The field from the descriptor filename. + Segments: A whitespace-seprated list of segment names. Any segment + which is referenced by this snapshot must be included in the + list, since this list can be used in garbage-collecting old + segments, determining which segments need to be downloaded to + completely reconstruct a snapshot, etc. + Root: A single object reference which points to the metadata + listing for the snapshot.