+++ /dev/null
-This document aims to describe the goals and constraints of the LBS
-design.
-
-========================================================================
-
-OVERALL GOALS: Efficient creation and storage of multiple backup
-snapshots of a filesystem tree. Logically, each snapshot is
-self-contained and stores the state of all files at a single point in
-time. However, it should be possible for snapshots to share the same
-underlying storage, so that data duplicated in many snapshots need not
-be stored multiple times. It should be possible to delete old
-snapshots, and recover (most of) the storage associated with them. It
-must be possible to delete old backups in any order; for example, it
-must be possible to delete intermediate backups before long-term
-backups. It should be possible to recover the files in a snapshot
-without transferring significantly more data than that stored in the
-files to be recovered.
-
-CONSTRAINTS: The system should not rely upon a smart server at the
-remote end where backups are stored. It should be possible to create
-new backups using a single primitive: StoreFile, which stores a string
-of bytes at the backup server using a specified filename. Thus, backups
-can be run over any file transfer protocol, without requiring special
-software be installed on the storage server.
-
-========================================================================
-
-DESIGN APPROACHES
-
-STORING INCREMENTAL BACKUPS
-
-One simple approach is to simply store a copy of every file one the
-remote end, and construct a listing which tells where each file in the
-source ends up on the remote server. For subsequent backups, if a file
-is unchanged, the listing can simply point to the location of the file
-from the previous backup. Deleting backups is simple: delete the
-listing file for a particular snapshot, then garbage collect all files
-which are no longer referenced.
-
-This approach does not as efficiently handle partial changes to large
-files. If a file is changed at all, it needs to be transferred in its
-entirety. One approach is to represent intra-file changes by storing
-patches. The original file is kept, and a smaller file is transferred
-that stores the differences between the original and the new. Some care
-is needed, however. A series of small changes could accumulate over
-many snapshots. If each snapshot refers to the original file, much data
-will be duplicated between the patches in different snapshots. If each
-patch can refer to previous patches as well, a long chain of patches can
-build up, which complicates removing old backups to reclaim storage.
-
-An alternative approach is to break files apart into smaller units
-(blocks) and to represent files in a snapshot as the concatenation of
-(possibly many) blocks. Small change to files can be represented by
-replacing a few of the blocks, but referring to most blocks used in the
-old file directly. Some care is needed with this approach as
-well--there is additional overhead needed to specify even the original
-file, since the entire list of blocks must be specified. If the block
-size is too small, this can lead to a large overhead, but if the block
-size is too large, then sharing of file data may not be achieved. In
-this scheme, data blocks do not depend on other data blocks, so chains
-of dependencies do not arise as in the incremental patching scheme.
-Each snapshot is independent, and so can easily be removed.
-
-One minor modification to this scheme is to permit the list of blocks to
-specify that only a portion of a block should be used to reconstruct a
-file; if, say, only the end of a block is changed, then the new backup
-can refer to most of the old block, and use a new block for the small
-changed part. Doing so does allow the possibility that a block might be
-kept around even though a portion of it is being used, leading to wasted
-space.
-
-
-DATA STORAGE
-
-The simplest data storage format would place each file, patch, or block
-in a separate file on the storage server. Doing so maximizes the
-ability to reclaim storage when deleting old snapshots, and minimizes
-the amount of extra data that must be transferred to recover a snapshot.
-Any other format which combines data from multiple files/patches/blocks
-together risks having needed data grouped with unwanted data.
-
-However, there are reasons to consider grouping, since there is overhead
-associated with storing many small files. In any transfer protocol
-which is not pipelined, transferring many small files may be slower than
-transferring the same quantity of data in larger files. Small files may
-also lead to more wasted storage space due to internal fragmentation.
-Grouping files together gives the chance for better compression, taking
-advantage of inter-file similarity.
-
-Grouping is even more important if the snapshot format breaks files
-apart into blocks for storage, since the number of blocks could be far
-larger than the number of files being backed up.
-
-========================================================================
-
-SELECTED DESIGN
-
-At a high level, the selected design stores snapshots by breaking files
-into blocks for storage, and does not use patches. These data blocks,
-along with the metadata fragments (collectively, the blocks and metadata
-are referred to as objects) are grouped together for storage purposes
-(each storage group is called a segment).
-
-TAR is chosen as the format for grouping objects together into segments
-rather than inventing a new format. Doing so makes it easy to
-manipulate the segments using other tools, if needed.
-
-Data blocks for files are stored as-is. Metadata is stored in a text
-format, to make it more transparent. (This should make debugging
-easier, and the hope is that this will make understanding the format
-simpler.)
--- /dev/null
+This document aims to describe the goals and constraints of the LBS
+design.
+
+========================================================================
+
+OVERALL GOALS: Efficient creation and storage of multiple backup
+snapshots of a filesystem tree. Logically, each snapshot is
+self-contained and stores the state of all files at a single point in
+time. However, it should be possible for snapshots to share the same
+underlying storage, so that data duplicated in many snapshots need not
+be stored multiple times. It should be possible to delete old
+snapshots, and recover (most of) the storage associated with them. It
+must be possible to delete old backups in any order; for example, it
+must be possible to delete intermediate backups before long-term
+backups. It should be possible to recover the files in a snapshot
+without transferring significantly more data than that stored in the
+files to be recovered.
+
+CONSTRAINTS: The system should not rely upon a smart server at the
+remote end where backups are stored. It should be possible to create
+new backups using a single primitive: StoreFile, which stores a string
+of bytes at the backup server using a specified filename. Thus, backups
+can be run over any file transfer protocol, without requiring special
+software be installed on the storage server.
+
+========================================================================
+
+DESIGN APPROACHES
+
+STORING INCREMENTAL BACKUPS
+
+One simple approach is to simply store a copy of every file one the
+remote end, and construct a listing which tells where each file in the
+source ends up on the remote server. For subsequent backups, if a file
+is unchanged, the listing can simply point to the location of the file
+from the previous backup. Deleting backups is simple: delete the
+listing file for a particular snapshot, then garbage collect all files
+which are no longer referenced.
+
+This approach does not as efficiently handle partial changes to large
+files. If a file is changed at all, it needs to be transferred in its
+entirety. One approach is to represent intra-file changes by storing
+patches. The original file is kept, and a smaller file is transferred
+that stores the differences between the original and the new. Some care
+is needed, however. A series of small changes could accumulate over
+many snapshots. If each snapshot refers to the original file, much data
+will be duplicated between the patches in different snapshots. If each
+patch can refer to previous patches as well, a long chain of patches can
+build up, which complicates removing old backups to reclaim storage.
+
+An alternative approach is to break files apart into smaller units
+(blocks) and to represent files in a snapshot as the concatenation of
+(possibly many) blocks. Small change to files can be represented by
+replacing a few of the blocks, but referring to most blocks used in the
+old file directly. Some care is needed with this approach as
+well--there is additional overhead needed to specify even the original
+file, since the entire list of blocks must be specified. If the block
+size is too small, this can lead to a large overhead, but if the block
+size is too large, then sharing of file data may not be achieved. In
+this scheme, data blocks do not depend on other data blocks, so chains
+of dependencies do not arise as in the incremental patching scheme.
+Each snapshot is independent, and so can easily be removed.
+
+One minor modification to this scheme is to permit the list of blocks to
+specify that only a portion of a block should be used to reconstruct a
+file; if, say, only the end of a block is changed, then the new backup
+can refer to most of the old block, and use a new block for the small
+changed part. Doing so does allow the possibility that a block might be
+kept around even though a portion of it is being used, leading to wasted
+space.
+
+
+DATA STORAGE
+
+The simplest data storage format would place each file, patch, or block
+in a separate file on the storage server. Doing so maximizes the
+ability to reclaim storage when deleting old snapshots, and minimizes
+the amount of extra data that must be transferred to recover a snapshot.
+Any other format which combines data from multiple files/patches/blocks
+together risks having needed data grouped with unwanted data.
+
+However, there are reasons to consider grouping, since there is overhead
+associated with storing many small files. In any transfer protocol
+which is not pipelined, transferring many small files may be slower than
+transferring the same quantity of data in larger files. Small files may
+also lead to more wasted storage space due to internal fragmentation.
+Grouping files together gives the chance for better compression, taking
+advantage of inter-file similarity.
+
+Grouping is even more important if the snapshot format breaks files
+apart into blocks for storage, since the number of blocks could be far
+larger than the number of files being backed up.
+
+========================================================================
+
+SELECTED DESIGN
+
+At a high level, the selected design stores snapshots by breaking files
+into blocks for storage, and does not use patches. These data blocks,
+along with the metadata fragments (collectively, the blocks and metadata
+are referred to as objects) are grouped together for storage purposes
+(each storage group is called a segment).
+
+TAR is chosen as the format for grouping objects together into segments
+rather than inventing a new format. Doing so makes it easy to
+manipulate the segments using other tools, if needed.
+
+Data blocks for files are stored as-is. Metadata is stored in a text
+format, to make it more transparent. (This should make debugging
+easier, and the hope is that this will make understanding the format
+simpler.)
--- /dev/null
+ Backup Format Description
+ for an LFS-Inspired Backup Solution
+
+NOTE: This format specification is not yet complete. Right now the code
+provides the best documentation of the format.
+
+This document simply describes the snapshot format. It is described
+from the point of view of a decompressor which wishes to restore the
+files from a snapshot. It does not specify the exact behavior required
+of the backup program writing the snapshot.
+
+This document does not explain the rationale behind the format; for
+that, see design.txt.
+
+
+DATA CHECKSUMS
+==============
+
+In several places in the LBS format, a cryptographic checksum may be
+used to allow data integrity to be verified. At the moment, only the
+SHA-1 checksum is supported, but it is expected that other algorithms
+will be supported in the future.
+
+When a checksum is called for, the checksum is always stored in a text
+format. The general format used is
+ <algorithm>=<hexdigits>
+
+<algorithm> identifies the checksum algorithm used, and allows new
+algorithms to be added later. At the moment, the only permissible value
+is "sha1", indicating a SHA-1 checksum.
+
+<hexdigits> is a sequence of hexadecimal digits which encode the
+checksum value. For sha1, <hexdigits> should be precisely 40 digits
+long.
+
+A sample checksum string is
+ sha1=67049e7931ad7db37b5c794d6ad146c82e5f3187
+
+
+SEGMENTS & OBJECTS: STORAGE AND NAMING
+======================================
+
+An LBS snapshot consists, at its base, of a collection of /objects/:
+binary blobs of data, much like a file. Higher layers interpret the
+contents of objects in various ways, but the lowest layer is simply
+concerned with storing and naming these objects.
+
+An object is a sequence of bytes (octets) of arbitrary length. An
+object may contain as few as zero bytes (though such objects are not
+very useful). Object sizes are potentially unbounded, but it is
+recommended that the maximum size of objects produced be on the order of
+megabytes. Files of essentially unlimited size can be stored in an LBS
+snapshot using objects of modest size, so this should not cause any real
+restrictions.
+
+For storage purposes, objects are grouped together into /segments/.
+Segments use the TAR format; each object within a segment is stored as a
+separate file. Segments are named using UUIDs (Universally Unique
+Identifiers), which are 128-bit numbers. The textual form of a UUID is
+a sequence of lowercase hexadecimal digits with hyphens inserted at
+fixed points; an example UUID is
+ a704eeae-97f2-4f30-91a4-d4473956366b
+This segment could be stored in the filesystem as a file
+ a704eeae-97f2-4f30-91a4-d4473956366b.tar
+The UUID used to name a segment is assigned when the segment is created.
+
+Filters can be layered on top of the segment storage to provide
+compression, encryption, or other features. For example, the example
+segment above might be stored as
+ a704eeae-97f2-4f30-91a4-d4473956366b.tar.bz2
+or
+ a704eeae-97f2-4f30-91a4-d4473956366b.tar.gpg
+if the file data had been filtered through bzip2 or gpg, respectively,
+before storage. Filtering of segment data is outside the scope of this
+format specification, however; it is assumed that if filtering is used,
+when decompressing the unfiltered data can be recovered (yielding data
+in the TAR format).
+
+Objects within a segment are numbered sequentially. This sequence
+number is then formatted as an 8-digit (zero-padded) hexadecimal
+(lowercase) value. The fully qualified name of an object consists of
+the segment name, followed by a slash ("/"), followed by the object
+sequence number. So, for example
+ a704eeae-97f2-4f30-91a4-d4473956366b/000001ad
+names an object.
+
+Within the segment TAR file, the filename used for each object is its
+fully-qualified name. Thus, when extracted using the standard tar
+utility, a segment will produce a directory with the same name as the
+segment itself, and that directory will contain a set of
+sequentially-numbered files each storing the contents of a single
+object.
+
+NOTE: When naming an object, the segment portion consists of the UUID
+only. Any extensions appended to the segment when storing it as a file
+in the filesystem (for example, .tar.bz2) are _not_ part of the name of
+the object.
+
+There are two additional components which may appear in an object name;
+both are optional.
+
+First, a checksum may be added to the object name to express an
+integrity constraint: the referred-to data must match the checksum
+given. A checksum is enclosed in parentheses and appended to the object
+name:
+ a704eeae-97f2-4f30-91a4-d4473956366b/000001ad(sha1=67049e7931ad7db37b5c794d6ad146c82e5f3187)
+
+Secondly, an object may be /sliced/: a subset of the bytes actually
+stored in the object may be selected to be returned. The slice syntax
+is
+ [<start>+<length>]
+where <start> is the first byte to return (as a decimal offset) and
+<length> specifies the number of bytes to return (again in decimal). It
+is invalid to select using the slice syntax a range of bytes that does
+not fall within the original object. The slice specification should be
+appended to an object name, for example:
+ a704eeae-97f2-4f30-91a4-d4473956366b/000001ad[264+1000]
+selects only bytes 264..1263 from the original object.
+
+Both a checksum and a slice can be used. In this case, the checksum is
+given first, followed by the slice. The checksum is computed over the
+original object contents, before slicing.
+
+
+FILE METADATA LISTING
+=====================
+
+A snapshot stores two distinct types of data into the object store
+described above: data and metadata. Data for a file may be stored as a
+single object, or the data may be broken apart into blocks which are
+stored as separate objects. The file /metadata/ log (which may be
+spread across multiple objects) specifies the names of the files in a
+snapshot, metadata about them such as ownership and timestamps, and
+gives the list of objects that contain the data for the file.
+
+The metadata log consists of a set of stanzas, each of which are
+formatted somewhat like RFC 822 (email) headers. An example is:
+
+ name: etc/fstab
+ checksum: sha1=11bd6ec140e4ec3110a91e1dd0f02b63b701421f
+ data: 2f46bce9-4554-4a60-a4a2-543637bd3989/000001f7
+ group: 0 (root)
+ mode: 0644
+ mtime: 1177977313
+ size: 867
+ type: -
+ user: 0 (root)
+
+The meanings of all the fields are described later. A blank line
+separates stanzas with information about different files. In addition
+to regular stanzas, the metadata listing may contain a line containing
+an object reference prefixed with "@". Such a line indicates that the
+contents of the referenced object should be fetched and parsed as a
+metadata listing at this point, prior to continuing to parse the current
+object.
+
+Several common encodings are used for various fields. The encoding used
+for each field is specified in the field listing that follows.
+ encoded string: An arbitrary string (octet sequence), with bytes
+ optionally escaped by replacing a byte with %xx, where "xx" is a
+ hexadecimal representation of the byte replaced. For example,
+ space can be replaced with "%20". This is the same escaping
+ mechanism as used in URLs.
+ integer: An integer, which may be written in decimal, octal, or
+ hexadecimal. Strings starting with 0 are interpreted as octal,
+ and those starting with 0x are intepreted as hexadecimal.
+
+Common fields (required in all stanzas):
+ name [encoded string]: Full path of the file archived.
+ user [special]: The user ID of the file, as an integer, optionally
+ followed by a space and the corresponding username, as an
+ escaped string enclosed in parentheses.
+ group [special]: The group ID which owns the file. Encoding is the
+ same as for the user field: an integer, with an optional name in
+ parentheses following.
+ mode [integer]: Unix mode bits for the file.
+ type [special]: A single character which indicates the type of file.
+ The type indicators are meant to be consistent with the
+ characters used to indicate file type in a directory listing:
+ - regular file
+ b block device
+ c character device
+ d directory
+ l symlink
+ p pipe
+ s socket
+ mtime [integer]: Modification time of the file.
+
+Optional common fields:
+ links [integer]: Number of hard links to this file, generally only
+ reported if greater than 1.
+ inode [string]: String specifying the inode number of this file when
+ it was dumped. If "links" is greater than 1, then searching for
+ other files that have an identical "inode" value can be used to
+ determine which files should be hard-linked together when
+ restoring. The inode field should be treated as an opaque
+ string and compared for equality as such; an implementation may
+ choose whatever representation is convenient. The format
+ produced by the standard tool is <major>/<minor>/<inode> (where
+ <major> and <minor> specify the device of the containing
+ filesystem and <inode> is the inode number of the file).
+
+Special fields used for regular files:
+ checksum [string]: Checksum of the file contents.
+ size [integer]: Size of the file, in bytes.
+ data [reference list]: Whitespace-separated list of object
+ references. The referenced data, when concatenated in the
+ listed order, will reconstruct the file data. Any reference
+ that begins with a "@" character is an indirect reference--the
+ given object includes a whitespace-separated list of object
+ references which should be parsed in the same manner as the data
+ field.
+
+
+SNAPSHOT DESCRIPTOR
+===================
+
+The snapshot descriptor is a small file which describes a single
+snapshot. It is one of the few files which is not stored as an object
+in the segment store. It is stored as a separate file, in plain text,
+but in the same directory as segments are stored.
+
+The name of snapshot descriptor file is
+ snapshot-<scheme>-<timestamp>.lbs
+<scheme> is a descriptive text which can be used to distinguish several
+logically distinct sets of snapshots (such as snapshots for two
+different directory trees) that are being stored in the same location.
+<timestamp> gives the date and time the snapshot was taken; the format
+is %Y%m%dT%H%M%S (20070806T092239 means 2007-08-06 09:22:39).
+
+The contents of the descriptor are a set of RFC 822-style headers (much
+like the metadata listing). The fields which are defined are:
+ Format: The string "LBS Snapshot v0.2" which identifies this file as
+ an LBS backup descriptor. The version number (v0.2) might
+ change if there are changes to the format. It is expected that
+ at some point, once the format is stabilized, the version
+ identifier will be changed to v1.0.
+ Producer: A informative string which identifies the program that
+ produced the backup.
+ Date: The date the snapshot was produced. This matches the
+ timestamp encoded in the filename, but is written out in full.
+ A timezone is given. For example: "2007-08-06 09:22:39 -0700".
+ Scheme: The <scheme> field from the descriptor filename.
+ Segments: A whitespace-seprated list of segment names. Any segment
+ which is referenced by this snapshot must be included in the
+ list, since this list can be used in garbage-collecting old
+ segments, determining which segments need to be downloaded to
+ completely reconstruct a snapshot, etc.
+ Root: A single object reference which points to the metadata
+ listing for the snapshot.
+ Checksums: A checksum file may be produced (with the same name as
+ the snapshot descriptor file, but with extension .sha1sums
+ instead of .lbs) containing SHA-1 checksums of all segments.
+ This field contains a checksum of that file.
--- /dev/null
+ LBS: An LFS-Inspired Backup Solution
+ Implementation Overview
+
+HIGH-LEVEL OVERVIEW
+===================
+
+There are two different classes of data stored, typically in different
+directories:
+
+The SNAPSHOT directory contains the actual backup contents. It consists
+of segment data (typically in compressed/encrypted form, one segment per
+file) as well as various small per-snapshot files such as the snapshot
+descriptor files (which names each snapshot and tells where to locate
+the data for it) and checksum files (which list checksums of segments
+for quick integrity checking). The snapshot directory may be stored on
+a remote server. It is write-only, in the sense that data does not need
+to be read from the snapshot directory to create a new snapshot, and
+files in it are immutable once created (they may be deleted if they are
+no longer needed, but file contents are never changed).
+
+The LOCAL DATABASE contains indexes used during the backup process.
+Files here keep track of what information is known to be stored in the
+snapshot directory, so that new snapshots can appropriate re-use data.
+The local database, as its name implies, should be stored somewhere
+local, since random access (read and write) will be required during the
+backup process. Unlike the snapshot directory, files here are not
+immutable.
+
+Only the data stored in the snapshot directory is required to restore a
+snapshot. The local database does not need to be backed up (stored at
+multiple separate locations, etc.). The contents of the local database
+can be rebuilt (at least in theory) from data in the snapshot directory
+and the local filesystem; it is expected that tools will eventually be
+provided to do so.
+
+The format of data in the snapshot directory is described in format.txt.
+The format of data in the local database is more fluid and may evolve
+over time. The current structure of the local database is described in
+this document.
+
+
+LOCAL DATABASE FORMAT
+=====================
+
+The local database directory currently contains two files:
+localdb.sqlite and a statcache file. (Actually, two types of files. It
+is possible to create snapshots using different schemes, and have them
+share the same local database directory. In this case, there will still
+be one localdb.sqlite file, but one statcache file for each backup
+scheme.)
+
+Each statcache file is a plain text file, with a format similar to the
+file metadata listing used in the snapshot directory. The purpose of
+the statcache file is to speed the backup process by making it possible
+to determine if a file has changed since the previous snapshot by
+comparing the results of a stat() system call with the data in the
+statcache file, and if the file is unchanged, providing the checksum and
+list of data blocks used to previously store the file. The statcache
+file is rewritten each time a snapshot is taken, and can safely be
+deleted (with the only major side effect being that the first backups
+after doing so will progress much more slowly).
+
+localdb.sqlite is an SQLite database file, which is used for indexing
+objects stored in the snapshot directory and various other purposes.
+The database schema is contained in the file schema.sql in the LBS
+source. Among the data tracked by localdb.sqlite:
+
+ - A list of segments stored in the snapshot directory. This might not
+ include all segments (segments belonging to old snapshots might be
+ removed), but for correctness all segments listed in the local
+ database must exist in the snapshot directory.
+
+ - A block index which tracks objects in the snapshot directory used to
+ store file data. It is indexed by block checksum, and so can be
+ used while generating a snapshot to determine if a just-read block
+ of data is already stored in the snapshot directory, and if so how
+ to name it.
+
+ - A list of recent snapshots, together with a list of the objects from
+ the block index they reference.
+
+The localdb SQL database is central to data sharing and segment
+cleaning. When creating a new snapshot, information about the new
+snapshot and the blocks is uses (including any new ones) is written to
+the database. Using the database, separate segment cleaning processes
+can determine how much data in various segments is still live, and
+determine which segments are best candidates for cleaning. Cleaning is
+performed by updating the database to mark objects in the cleaned
+segments as unavailable for use in future snapshots; when the backup
+process next runs, any files that would use these expired blocks instead
+have a copy of the data written to a new segment.
+++ /dev/null
- Backup Format Description
- for an LFS-Inspired Backup Solution
-
-NOTE: This format specification is not yet complete. Right now the code
-provides the best documentation of the format.
-
-This document simply describes the snapshot format. It is described
-from the point of view of a decompressor which wishes to restore the
-files from a snapshot. It does not specify the exact behavior required
-of the backup program writing the snapshot.
-
-This document does not explain the rationale behind the format; for
-that, see design.txt.
-
-
-DATA CHECKSUMS
-==============
-
-In several places in the LBS format, a cryptographic checksum may be
-used to allow data integrity to be verified. At the moment, only the
-SHA-1 checksum is supported, but it is expected that other algorithms
-will be supported in the future.
-
-When a checksum is called for, the checksum is always stored in a text
-format. The general format used is
- <algorithm>=<hexdigits>
-
-<algorithm> identifies the checksum algorithm used, and allows new
-algorithms to be added later. At the moment, the only permissible value
-is "sha1", indicating a SHA-1 checksum.
-
-<hexdigits> is a sequence of hexadecimal digits which encode the
-checksum value. For sha1, <hexdigits> should be precisely 40 digits
-long.
-
-A sample checksum string is
- sha1=67049e7931ad7db37b5c794d6ad146c82e5f3187
-
-
-SEGMENTS & OBJECTS: STORAGE AND NAMING
-======================================
-
-An LBS snapshot consists, at its base, of a collection of /objects/:
-binary blobs of data, much like a file. Higher layers interpret the
-contents of objects in various ways, but the lowest layer is simply
-concerned with storing and naming these objects.
-
-An object is a sequence of bytes (octets) of arbitrary length. An
-object may contain as few as zero bytes (though such objects are not
-very useful). Object sizes are potentially unbounded, but it is
-recommended that the maximum size of objects produced be on the order of
-megabytes. Files of essentially unlimited size can be stored in an LBS
-snapshot using objects of modest size, so this should not cause any real
-restrictions.
-
-For storage purposes, objects are grouped together into /segments/.
-Segments use the TAR format; each object within a segment is stored as a
-separate file. Segments are named using UUIDs (Universally Unique
-Identifiers), which are 128-bit numbers. The textual form of a UUID is
-a sequence of lowercase hexadecimal digits with hyphens inserted at
-fixed points; an example UUID is
- a704eeae-97f2-4f30-91a4-d4473956366b
-This segment could be stored in the filesystem as a file
- a704eeae-97f2-4f30-91a4-d4473956366b.tar
-The UUID used to name a segment is assigned when the segment is created.
-
-Filters can be layered on top of the segment storage to provide
-compression, encryption, or other features. For example, the example
-segment above might be stored as
- a704eeae-97f2-4f30-91a4-d4473956366b.tar.bz2
-or
- a704eeae-97f2-4f30-91a4-d4473956366b.tar.gpg
-if the file data had been filtered through bzip2 or gpg, respectively,
-before storage. Filtering of segment data is outside the scope of this
-format specification, however; it is assumed that if filtering is used,
-when decompressing the unfiltered data can be recovered (yielding data
-in the TAR format).
-
-Objects within a segment are numbered sequentially. This sequence
-number is then formatted as an 8-digit (zero-padded) hexadecimal
-(lowercase) value. The fully qualified name of an object consists of
-the segment name, followed by a slash ("/"), followed by the object
-sequence number. So, for example
- a704eeae-97f2-4f30-91a4-d4473956366b/000001ad
-names an object.
-
-Within the segment TAR file, the filename used for each object is its
-fully-qualified name. Thus, when extracted using the standard tar
-utility, a segment will produce a directory with the same name as the
-segment itself, and that directory will contain a set of
-sequentially-numbered files each storing the contents of a single
-object.
-
-NOTE: When naming an object, the segment portion consists of the UUID
-only. Any extensions appended to the segment when storing it as a file
-in the filesystem (for example, .tar.bz2) are _not_ part of the name of
-the object.
-
-There are two additional components which may appear in an object name;
-both are optional.
-
-First, a checksum may be added to the object name to express an
-integrity constraint: the referred-to data must match the checksum
-given. A checksum is enclosed in parentheses and appended to the object
-name:
- a704eeae-97f2-4f30-91a4-d4473956366b/000001ad(sha1=67049e7931ad7db37b5c794d6ad146c82e5f3187)
-
-Secondly, an object may be /sliced/: a subset of the bytes actually
-stored in the object may be selected to be returned. The slice syntax
-is
- [<start>+<length>]
-where <start> is the first byte to return (as a decimal offset) and
-<length> specifies the number of bytes to return (again in decimal). It
-is invalid to select using the slice syntax a range of bytes that does
-not fall within the original object. The slice specification should be
-appended to an object name, for example:
- a704eeae-97f2-4f30-91a4-d4473956366b/000001ad[264+1000]
-selects only bytes 264..1263 from the original object.
-
-Both a checksum and a slice can be used. In this case, the checksum is
-given first, followed by the slice. The checksum is computed over the
-original object contents, before slicing.
-
-
-FILE METADATA LISTING
-=====================
-
-A snapshot stores two distinct types of data into the object store
-described above: data and metadata. Data for a file may be stored as a
-single object, or the data may be broken apart into blocks which are
-stored as separate objects. The file /metadata/ log (which may be
-spread across multiple objects) specifies the names of the files in a
-snapshot, metadata about them such as ownership and timestamps, and
-gives the list of objects that contain the data for the file.
-
-The metadata log consists of a set of stanzas, each of which are
-formatted somewhat like RFC 822 (email) headers. An example is:
-
- name: etc/fstab
- checksum: sha1=11bd6ec140e4ec3110a91e1dd0f02b63b701421f
- data: 2f46bce9-4554-4a60-a4a2-543637bd3989/000001f7
- group: 0 (root)
- mode: 0644
- mtime: 1177977313
- size: 867
- type: -
- user: 0 (root)
-
-The meanings of all the fields are described later. A blank line
-separates stanzas with information about different files. In addition
-to regular stanzas, the metadata listing may contain a line containing
-an object reference prefixed with "@". Such a line indicates that the
-contents of the referenced object should be fetched and parsed as a
-metadata listing at this point, prior to continuing to parse the current
-object.
-
-Several common encodings are used for various fields. The encoding used
-for each field is specified in the field listing that follows.
- encoded string: An arbitrary string (octet sequence), with bytes
- optionally escaped by replacing a byte with %xx, where "xx" is a
- hexadecimal representation of the byte replaced. For example,
- space can be replaced with "%20". This is the same escaping
- mechanism as used in URLs.
- integer: An integer, which may be written in decimal, octal, or
- hexadecimal. Strings starting with 0 are interpreted as octal,
- and those starting with 0x are intepreted as hexadecimal.
-
-Common fields (required in all stanzas):
- name [encoded string]: Full path of the file archived.
- user [special]: The user ID of the file, as an integer, optionally
- followed by a space and the corresponding username, as an
- escaped string enclosed in parentheses.
- group [special]: The group ID which owns the file. Encoding is the
- same as for the user field: an integer, with an optional name in
- parentheses following.
- mode [integer]: Unix mode bits for the file.
- type [special]: A single character which indicates the type of file.
- The type indicators are meant to be consistent with the
- characters used to indicate file type in a directory listing:
- - regular file
- b block device
- c character device
- d directory
- l symlink
- p pipe
- s socket
- mtime [integer]: Modification time of the file.
-
-Optional common fields:
- links [integer]: Number of hard links to this file, generally only
- reported if greater than 1.
- inode [string]: String specifying the inode number of this file when
- it was dumped. If "links" is greater than 1, then searching for
- other files that have an identical "inode" value can be used to
- determine which files should be hard-linked together when
- restoring. The inode field should be treated as an opaque
- string and compared for equality as such; an implementation may
- choose whatever representation is convenient. The format
- produced by the standard tool is <major>/<minor>/<inode> (where
- <major> and <minor> specify the device of the containing
- filesystem and <inode> is the inode number of the file).
-
-Special fields used for regular files:
- checksum [string]: Checksum of the file contents.
- size [integer]: Size of the file, in bytes.
- data [reference list]: Whitespace-separated list of object
- references. The referenced data, when concatenated in the
- listed order, will reconstruct the file data. Any reference
- that begins with a "@" character is an indirect reference--the
- given object includes a whitespace-separated list of object
- references which should be parsed in the same manner as the data
- field.
-
-
-SNAPSHOT DESCRIPTOR
-===================
-
-The snapshot descriptor is a small file which describes a single
-snapshot. It is one of the few files which is not stored as an object
-in the segment store. It is stored as a separate file, in plain text,
-but in the same directory as segments are stored.
-
-The name of snapshot descriptor file is
- snapshot-<scheme>-<timestamp>.lbs
-<scheme> is a descriptive text which can be used to distinguish several
-logically distinct sets of snapshots (such as snapshots for two
-different directory trees) that are being stored in the same location.
-<timestamp> gives the date and time the snapshot was taken; the format
-is %Y%m%dT%H%M%S (20070806T092239 means 2007-08-06 09:22:39).
-
-The contents of the descriptor are a set of RFC 822-style headers (much
-like the metadata listing). The fields which are defined are:
- Format: The string "LBS Snapshot v0.2" which identifies this file as
- an LBS backup descriptor. The version number (v0.2) might
- change if there are changes to the format. It is expected that
- at some point, once the format is stabilized, the version
- identifier will be changed to v1.0.
- Producer: A informative string which identifies the program that
- produced the backup.
- Date: The date the snapshot was produced. This matches the
- timestamp encoded in the filename, but is written out in full.
- A timezone is given. For example: "2007-08-06 09:22:39 -0700".
- Scheme: The <scheme> field from the descriptor filename.
- Segments: A whitespace-seprated list of segment names. Any segment
- which is referenced by this snapshot must be included in the
- list, since this list can be used in garbage-collecting old
- segments, determining which segments need to be downloaded to
- completely reconstruct a snapshot, etc.
- Root: A single object reference which points to the metadata
- listing for the snapshot.
- Checksums: A checksum file may be produced (with the same name as
- the snapshot descriptor file, but with extension .sha1sums
- instead of .lbs) containing SHA-1 checksums of all segments.
- This field contains a checksum of that file.
+++ /dev/null
- LBS: An LFS-Inspired Backup Solution
- Implementation Overview
-
-HIGH-LEVEL OVERVIEW
-===================
-
-There are two different classes of data stored, typically in different
-directories:
-
-The SNAPSHOT directory contains the actual backup contents. It consists
-of segment data (typically in compressed/encrypted form, one segment per
-file) as well as various small per-snapshot files such as the snapshot
-descriptor files (which names each snapshot and tells where to locate
-the data for it) and checksum files (which list checksums of segments
-for quick integrity checking). The snapshot directory may be stored on
-a remote server. It is write-only, in the sense that data does not need
-to be read from the snapshot directory to create a new snapshot, and
-files in it are immutable once created (they may be deleted if they are
-no longer needed, but file contents are never changed).
-
-The LOCAL DATABASE contains indexes used during the backup process.
-Files here keep track of what information is known to be stored in the
-snapshot directory, so that new snapshots can appropriate re-use data.
-The local database, as its name implies, should be stored somewhere
-local, since random access (read and write) will be required during the
-backup process. Unlike the snapshot directory, files here are not
-immutable.
-
-Only the data stored in the snapshot directory is required to restore a
-snapshot. The local database does not need to be backed up (stored at
-multiple separate locations, etc.). The contents of the local database
-can be rebuilt (at least in theory) from data in the snapshot directory
-and the local filesystem; it is expected that tools will eventually be
-provided to do so.
-
-The format of data in the snapshot directory is described in format.txt.
-The format of data in the local database is more fluid and may evolve
-over time. The current structure of the local database is described in
-this document.
-
-
-LOCAL DATABASE FORMAT
-=====================
-
-The local database directory currently contains two files:
-localdb.sqlite and a statcache file. (Actually, two types of files. It
-is possible to create snapshots using different schemes, and have them
-share the same local database directory. In this case, there will still
-be one localdb.sqlite file, but one statcache file for each backup
-scheme.)
-
-Each statcache file is a plain text file, with a format similar to the
-file metadata listing used in the snapshot directory. The purpose of
-the statcache file is to speed the backup process by making it possible
-to determine if a file has changed since the previous snapshot by
-comparing the results of a stat() system call with the data in the
-statcache file, and if the file is unchanged, providing the checksum and
-list of data blocks used to previously store the file. The statcache
-file is rewritten each time a snapshot is taken, and can safely be
-deleted (with the only major side effect being that the first backups
-after doing so will progress much more slowly).
-
-localdb.sqlite is an SQLite database file, which is used for indexing
-objects stored in the snapshot directory and various other purposes.
-The database schema is contained in the file schema.sql in the LBS
-source. Among the data tracked by localdb.sqlite:
-
- - A list of segments stored in the snapshot directory. This might not
- include all segments (segments belonging to old snapshots might be
- removed), but for correctness all segments listed in the local
- database must exist in the snapshot directory.
-
- - A block index which tracks objects in the snapshot directory used to
- store file data. It is indexed by block checksum, and so can be
- used while generating a snapshot to determine if a just-read block
- of data is already stored in the snapshot directory, and if so how
- to name it.
-
- - A list of recent snapshots, together with a list of the objects from
- the block index they reference.
-
-The localdb SQL database is central to data sharing and segment
-cleaning. When creating a new snapshot, information about the new
-snapshot and the blocks is uses (including any new ones) is written to
-the database. Using the database, separate segment cleaning processes
-can determine how much data in various segments is still live, and
-determine which segments are best candidates for cleaning. Cleaning is
-performed by updating the database to mark objects in the cleaned
-segments as unavailable for use in future snapshots; when the backup
-process next runs, any files that would use these expired blocks instead
-have a copy of the data written to a new segment.