From: Michael Vrable Date: Thu, 13 Sep 2007 20:18:19 +0000 (-0700) Subject: Move most documentation into a doc/ subdirectory. X-Git-Url: http://git.vrable.net/?p=cumulus.git;a=commitdiff_plain;h=3fed0cd9c087ef575777056666c80397dd23530c Move most documentation into a doc/ subdirectory. --- diff --git a/design.txt b/design.txt deleted file mode 100644 index e515359..0000000 --- a/design.txt +++ /dev/null @@ -1,111 +0,0 @@ -This document aims to describe the goals and constraints of the LBS -design. - -======================================================================== - -OVERALL GOALS: Efficient creation and storage of multiple backup -snapshots of a filesystem tree. Logically, each snapshot is -self-contained and stores the state of all files at a single point in -time. However, it should be possible for snapshots to share the same -underlying storage, so that data duplicated in many snapshots need not -be stored multiple times. It should be possible to delete old -snapshots, and recover (most of) the storage associated with them. It -must be possible to delete old backups in any order; for example, it -must be possible to delete intermediate backups before long-term -backups. It should be possible to recover the files in a snapshot -without transferring significantly more data than that stored in the -files to be recovered. - -CONSTRAINTS: The system should not rely upon a smart server at the -remote end where backups are stored. It should be possible to create -new backups using a single primitive: StoreFile, which stores a string -of bytes at the backup server using a specified filename. Thus, backups -can be run over any file transfer protocol, without requiring special -software be installed on the storage server. - -======================================================================== - -DESIGN APPROACHES - -STORING INCREMENTAL BACKUPS - -One simple approach is to simply store a copy of every file one the -remote end, and construct a listing which tells where each file in the -source ends up on the remote server. For subsequent backups, if a file -is unchanged, the listing can simply point to the location of the file -from the previous backup. Deleting backups is simple: delete the -listing file for a particular snapshot, then garbage collect all files -which are no longer referenced. - -This approach does not as efficiently handle partial changes to large -files. If a file is changed at all, it needs to be transferred in its -entirety. One approach is to represent intra-file changes by storing -patches. The original file is kept, and a smaller file is transferred -that stores the differences between the original and the new. Some care -is needed, however. A series of small changes could accumulate over -many snapshots. If each snapshot refers to the original file, much data -will be duplicated between the patches in different snapshots. If each -patch can refer to previous patches as well, a long chain of patches can -build up, which complicates removing old backups to reclaim storage. - -An alternative approach is to break files apart into smaller units -(blocks) and to represent files in a snapshot as the concatenation of -(possibly many) blocks. Small change to files can be represented by -replacing a few of the blocks, but referring to most blocks used in the -old file directly. Some care is needed with this approach as -well--there is additional overhead needed to specify even the original -file, since the entire list of blocks must be specified. If the block -size is too small, this can lead to a large overhead, but if the block -size is too large, then sharing of file data may not be achieved. In -this scheme, data blocks do not depend on other data blocks, so chains -of dependencies do not arise as in the incremental patching scheme. -Each snapshot is independent, and so can easily be removed. - -One minor modification to this scheme is to permit the list of blocks to -specify that only a portion of a block should be used to reconstruct a -file; if, say, only the end of a block is changed, then the new backup -can refer to most of the old block, and use a new block for the small -changed part. Doing so does allow the possibility that a block might be -kept around even though a portion of it is being used, leading to wasted -space. - - -DATA STORAGE - -The simplest data storage format would place each file, patch, or block -in a separate file on the storage server. Doing so maximizes the -ability to reclaim storage when deleting old snapshots, and minimizes -the amount of extra data that must be transferred to recover a snapshot. -Any other format which combines data from multiple files/patches/blocks -together risks having needed data grouped with unwanted data. - -However, there are reasons to consider grouping, since there is overhead -associated with storing many small files. In any transfer protocol -which is not pipelined, transferring many small files may be slower than -transferring the same quantity of data in larger files. Small files may -also lead to more wasted storage space due to internal fragmentation. -Grouping files together gives the chance for better compression, taking -advantage of inter-file similarity. - -Grouping is even more important if the snapshot format breaks files -apart into blocks for storage, since the number of blocks could be far -larger than the number of files being backed up. - -======================================================================== - -SELECTED DESIGN - -At a high level, the selected design stores snapshots by breaking files -into blocks for storage, and does not use patches. These data blocks, -along with the metadata fragments (collectively, the blocks and metadata -are referred to as objects) are grouped together for storage purposes -(each storage group is called a segment). - -TAR is chosen as the format for grouping objects together into segments -rather than inventing a new format. Doing so makes it easy to -manipulate the segments using other tools, if needed. - -Data blocks for files are stored as-is. Metadata is stored in a text -format, to make it more transparent. (This should make debugging -easier, and the hope is that this will make understanding the format -simpler.) diff --git a/doc/design.txt b/doc/design.txt new file mode 100644 index 0000000..e515359 --- /dev/null +++ b/doc/design.txt @@ -0,0 +1,111 @@ +This document aims to describe the goals and constraints of the LBS +design. + +======================================================================== + +OVERALL GOALS: Efficient creation and storage of multiple backup +snapshots of a filesystem tree. Logically, each snapshot is +self-contained and stores the state of all files at a single point in +time. However, it should be possible for snapshots to share the same +underlying storage, so that data duplicated in many snapshots need not +be stored multiple times. It should be possible to delete old +snapshots, and recover (most of) the storage associated with them. It +must be possible to delete old backups in any order; for example, it +must be possible to delete intermediate backups before long-term +backups. It should be possible to recover the files in a snapshot +without transferring significantly more data than that stored in the +files to be recovered. + +CONSTRAINTS: The system should not rely upon a smart server at the +remote end where backups are stored. It should be possible to create +new backups using a single primitive: StoreFile, which stores a string +of bytes at the backup server using a specified filename. Thus, backups +can be run over any file transfer protocol, without requiring special +software be installed on the storage server. + +======================================================================== + +DESIGN APPROACHES + +STORING INCREMENTAL BACKUPS + +One simple approach is to simply store a copy of every file one the +remote end, and construct a listing which tells where each file in the +source ends up on the remote server. For subsequent backups, if a file +is unchanged, the listing can simply point to the location of the file +from the previous backup. Deleting backups is simple: delete the +listing file for a particular snapshot, then garbage collect all files +which are no longer referenced. + +This approach does not as efficiently handle partial changes to large +files. If a file is changed at all, it needs to be transferred in its +entirety. One approach is to represent intra-file changes by storing +patches. The original file is kept, and a smaller file is transferred +that stores the differences between the original and the new. Some care +is needed, however. A series of small changes could accumulate over +many snapshots. If each snapshot refers to the original file, much data +will be duplicated between the patches in different snapshots. If each +patch can refer to previous patches as well, a long chain of patches can +build up, which complicates removing old backups to reclaim storage. + +An alternative approach is to break files apart into smaller units +(blocks) and to represent files in a snapshot as the concatenation of +(possibly many) blocks. Small change to files can be represented by +replacing a few of the blocks, but referring to most blocks used in the +old file directly. Some care is needed with this approach as +well--there is additional overhead needed to specify even the original +file, since the entire list of blocks must be specified. If the block +size is too small, this can lead to a large overhead, but if the block +size is too large, then sharing of file data may not be achieved. In +this scheme, data blocks do not depend on other data blocks, so chains +of dependencies do not arise as in the incremental patching scheme. +Each snapshot is independent, and so can easily be removed. + +One minor modification to this scheme is to permit the list of blocks to +specify that only a portion of a block should be used to reconstruct a +file; if, say, only the end of a block is changed, then the new backup +can refer to most of the old block, and use a new block for the small +changed part. Doing so does allow the possibility that a block might be +kept around even though a portion of it is being used, leading to wasted +space. + + +DATA STORAGE + +The simplest data storage format would place each file, patch, or block +in a separate file on the storage server. Doing so maximizes the +ability to reclaim storage when deleting old snapshots, and minimizes +the amount of extra data that must be transferred to recover a snapshot. +Any other format which combines data from multiple files/patches/blocks +together risks having needed data grouped with unwanted data. + +However, there are reasons to consider grouping, since there is overhead +associated with storing many small files. In any transfer protocol +which is not pipelined, transferring many small files may be slower than +transferring the same quantity of data in larger files. Small files may +also lead to more wasted storage space due to internal fragmentation. +Grouping files together gives the chance for better compression, taking +advantage of inter-file similarity. + +Grouping is even more important if the snapshot format breaks files +apart into blocks for storage, since the number of blocks could be far +larger than the number of files being backed up. + +======================================================================== + +SELECTED DESIGN + +At a high level, the selected design stores snapshots by breaking files +into blocks for storage, and does not use patches. These data blocks, +along with the metadata fragments (collectively, the blocks and metadata +are referred to as objects) are grouped together for storage purposes +(each storage group is called a segment). + +TAR is chosen as the format for grouping objects together into segments +rather than inventing a new format. Doing so makes it easy to +manipulate the segments using other tools, if needed. + +Data blocks for files are stored as-is. Metadata is stored in a text +format, to make it more transparent. (This should make debugging +easier, and the hope is that this will make understanding the format +simpler.) diff --git a/doc/format.txt b/doc/format.txt new file mode 100644 index 0000000..78f32c3 --- /dev/null +++ b/doc/format.txt @@ -0,0 +1,254 @@ + Backup Format Description + for an LFS-Inspired Backup Solution + +NOTE: This format specification is not yet complete. Right now the code +provides the best documentation of the format. + +This document simply describes the snapshot format. It is described +from the point of view of a decompressor which wishes to restore the +files from a snapshot. It does not specify the exact behavior required +of the backup program writing the snapshot. + +This document does not explain the rationale behind the format; for +that, see design.txt. + + +DATA CHECKSUMS +============== + +In several places in the LBS format, a cryptographic checksum may be +used to allow data integrity to be verified. At the moment, only the +SHA-1 checksum is supported, but it is expected that other algorithms +will be supported in the future. + +When a checksum is called for, the checksum is always stored in a text +format. The general format used is + = + + identifies the checksum algorithm used, and allows new +algorithms to be added later. At the moment, the only permissible value +is "sha1", indicating a SHA-1 checksum. + + is a sequence of hexadecimal digits which encode the +checksum value. For sha1, should be precisely 40 digits +long. + +A sample checksum string is + sha1=67049e7931ad7db37b5c794d6ad146c82e5f3187 + + +SEGMENTS & OBJECTS: STORAGE AND NAMING +====================================== + +An LBS snapshot consists, at its base, of a collection of /objects/: +binary blobs of data, much like a file. Higher layers interpret the +contents of objects in various ways, but the lowest layer is simply +concerned with storing and naming these objects. + +An object is a sequence of bytes (octets) of arbitrary length. An +object may contain as few as zero bytes (though such objects are not +very useful). Object sizes are potentially unbounded, but it is +recommended that the maximum size of objects produced be on the order of +megabytes. Files of essentially unlimited size can be stored in an LBS +snapshot using objects of modest size, so this should not cause any real +restrictions. + +For storage purposes, objects are grouped together into /segments/. +Segments use the TAR format; each object within a segment is stored as a +separate file. Segments are named using UUIDs (Universally Unique +Identifiers), which are 128-bit numbers. The textual form of a UUID is +a sequence of lowercase hexadecimal digits with hyphens inserted at +fixed points; an example UUID is + a704eeae-97f2-4f30-91a4-d4473956366b +This segment could be stored in the filesystem as a file + a704eeae-97f2-4f30-91a4-d4473956366b.tar +The UUID used to name a segment is assigned when the segment is created. + +Filters can be layered on top of the segment storage to provide +compression, encryption, or other features. For example, the example +segment above might be stored as + a704eeae-97f2-4f30-91a4-d4473956366b.tar.bz2 +or + a704eeae-97f2-4f30-91a4-d4473956366b.tar.gpg +if the file data had been filtered through bzip2 or gpg, respectively, +before storage. Filtering of segment data is outside the scope of this +format specification, however; it is assumed that if filtering is used, +when decompressing the unfiltered data can be recovered (yielding data +in the TAR format). + +Objects within a segment are numbered sequentially. This sequence +number is then formatted as an 8-digit (zero-padded) hexadecimal +(lowercase) value. The fully qualified name of an object consists of +the segment name, followed by a slash ("/"), followed by the object +sequence number. So, for example + a704eeae-97f2-4f30-91a4-d4473956366b/000001ad +names an object. + +Within the segment TAR file, the filename used for each object is its +fully-qualified name. Thus, when extracted using the standard tar +utility, a segment will produce a directory with the same name as the +segment itself, and that directory will contain a set of +sequentially-numbered files each storing the contents of a single +object. + +NOTE: When naming an object, the segment portion consists of the UUID +only. Any extensions appended to the segment when storing it as a file +in the filesystem (for example, .tar.bz2) are _not_ part of the name of +the object. + +There are two additional components which may appear in an object name; +both are optional. + +First, a checksum may be added to the object name to express an +integrity constraint: the referred-to data must match the checksum +given. A checksum is enclosed in parentheses and appended to the object +name: + a704eeae-97f2-4f30-91a4-d4473956366b/000001ad(sha1=67049e7931ad7db37b5c794d6ad146c82e5f3187) + +Secondly, an object may be /sliced/: a subset of the bytes actually +stored in the object may be selected to be returned. The slice syntax +is + [+] +where is the first byte to return (as a decimal offset) and + specifies the number of bytes to return (again in decimal). It +is invalid to select using the slice syntax a range of bytes that does +not fall within the original object. The slice specification should be +appended to an object name, for example: + a704eeae-97f2-4f30-91a4-d4473956366b/000001ad[264+1000] +selects only bytes 264..1263 from the original object. + +Both a checksum and a slice can be used. In this case, the checksum is +given first, followed by the slice. The checksum is computed over the +original object contents, before slicing. + + +FILE METADATA LISTING +===================== + +A snapshot stores two distinct types of data into the object store +described above: data and metadata. Data for a file may be stored as a +single object, or the data may be broken apart into blocks which are +stored as separate objects. The file /metadata/ log (which may be +spread across multiple objects) specifies the names of the files in a +snapshot, metadata about them such as ownership and timestamps, and +gives the list of objects that contain the data for the file. + +The metadata log consists of a set of stanzas, each of which are +formatted somewhat like RFC 822 (email) headers. An example is: + + name: etc/fstab + checksum: sha1=11bd6ec140e4ec3110a91e1dd0f02b63b701421f + data: 2f46bce9-4554-4a60-a4a2-543637bd3989/000001f7 + group: 0 (root) + mode: 0644 + mtime: 1177977313 + size: 867 + type: - + user: 0 (root) + +The meanings of all the fields are described later. A blank line +separates stanzas with information about different files. In addition +to regular stanzas, the metadata listing may contain a line containing +an object reference prefixed with "@". Such a line indicates that the +contents of the referenced object should be fetched and parsed as a +metadata listing at this point, prior to continuing to parse the current +object. + +Several common encodings are used for various fields. The encoding used +for each field is specified in the field listing that follows. + encoded string: An arbitrary string (octet sequence), with bytes + optionally escaped by replacing a byte with %xx, where "xx" is a + hexadecimal representation of the byte replaced. For example, + space can be replaced with "%20". This is the same escaping + mechanism as used in URLs. + integer: An integer, which may be written in decimal, octal, or + hexadecimal. Strings starting with 0 are interpreted as octal, + and those starting with 0x are intepreted as hexadecimal. + +Common fields (required in all stanzas): + name [encoded string]: Full path of the file archived. + user [special]: The user ID of the file, as an integer, optionally + followed by a space and the corresponding username, as an + escaped string enclosed in parentheses. + group [special]: The group ID which owns the file. Encoding is the + same as for the user field: an integer, with an optional name in + parentheses following. + mode [integer]: Unix mode bits for the file. + type [special]: A single character which indicates the type of file. + The type indicators are meant to be consistent with the + characters used to indicate file type in a directory listing: + - regular file + b block device + c character device + d directory + l symlink + p pipe + s socket + mtime [integer]: Modification time of the file. + +Optional common fields: + links [integer]: Number of hard links to this file, generally only + reported if greater than 1. + inode [string]: String specifying the inode number of this file when + it was dumped. If "links" is greater than 1, then searching for + other files that have an identical "inode" value can be used to + determine which files should be hard-linked together when + restoring. The inode field should be treated as an opaque + string and compared for equality as such; an implementation may + choose whatever representation is convenient. The format + produced by the standard tool is // (where + and specify the device of the containing + filesystem and is the inode number of the file). + +Special fields used for regular files: + checksum [string]: Checksum of the file contents. + size [integer]: Size of the file, in bytes. + data [reference list]: Whitespace-separated list of object + references. The referenced data, when concatenated in the + listed order, will reconstruct the file data. Any reference + that begins with a "@" character is an indirect reference--the + given object includes a whitespace-separated list of object + references which should be parsed in the same manner as the data + field. + + +SNAPSHOT DESCRIPTOR +=================== + +The snapshot descriptor is a small file which describes a single +snapshot. It is one of the few files which is not stored as an object +in the segment store. It is stored as a separate file, in plain text, +but in the same directory as segments are stored. + +The name of snapshot descriptor file is + snapshot--.lbs + is a descriptive text which can be used to distinguish several +logically distinct sets of snapshots (such as snapshots for two +different directory trees) that are being stored in the same location. + gives the date and time the snapshot was taken; the format +is %Y%m%dT%H%M%S (20070806T092239 means 2007-08-06 09:22:39). + +The contents of the descriptor are a set of RFC 822-style headers (much +like the metadata listing). The fields which are defined are: + Format: The string "LBS Snapshot v0.2" which identifies this file as + an LBS backup descriptor. The version number (v0.2) might + change if there are changes to the format. It is expected that + at some point, once the format is stabilized, the version + identifier will be changed to v1.0. + Producer: A informative string which identifies the program that + produced the backup. + Date: The date the snapshot was produced. This matches the + timestamp encoded in the filename, but is written out in full. + A timezone is given. For example: "2007-08-06 09:22:39 -0700". + Scheme: The field from the descriptor filename. + Segments: A whitespace-seprated list of segment names. Any segment + which is referenced by this snapshot must be included in the + list, since this list can be used in garbage-collecting old + segments, determining which segments need to be downloaded to + completely reconstruct a snapshot, etc. + Root: A single object reference which points to the metadata + listing for the snapshot. + Checksums: A checksum file may be produced (with the same name as + the snapshot descriptor file, but with extension .sha1sums + instead of .lbs) containing SHA-1 checksums of all segments. + This field contains a checksum of that file. diff --git a/doc/implementation.txt b/doc/implementation.txt new file mode 100644 index 0000000..1ba78ac --- /dev/null +++ b/doc/implementation.txt @@ -0,0 +1,91 @@ + LBS: An LFS-Inspired Backup Solution + Implementation Overview + +HIGH-LEVEL OVERVIEW +=================== + +There are two different classes of data stored, typically in different +directories: + +The SNAPSHOT directory contains the actual backup contents. It consists +of segment data (typically in compressed/encrypted form, one segment per +file) as well as various small per-snapshot files such as the snapshot +descriptor files (which names each snapshot and tells where to locate +the data for it) and checksum files (which list checksums of segments +for quick integrity checking). The snapshot directory may be stored on +a remote server. It is write-only, in the sense that data does not need +to be read from the snapshot directory to create a new snapshot, and +files in it are immutable once created (they may be deleted if they are +no longer needed, but file contents are never changed). + +The LOCAL DATABASE contains indexes used during the backup process. +Files here keep track of what information is known to be stored in the +snapshot directory, so that new snapshots can appropriate re-use data. +The local database, as its name implies, should be stored somewhere +local, since random access (read and write) will be required during the +backup process. Unlike the snapshot directory, files here are not +immutable. + +Only the data stored in the snapshot directory is required to restore a +snapshot. The local database does not need to be backed up (stored at +multiple separate locations, etc.). The contents of the local database +can be rebuilt (at least in theory) from data in the snapshot directory +and the local filesystem; it is expected that tools will eventually be +provided to do so. + +The format of data in the snapshot directory is described in format.txt. +The format of data in the local database is more fluid and may evolve +over time. The current structure of the local database is described in +this document. + + +LOCAL DATABASE FORMAT +===================== + +The local database directory currently contains two files: +localdb.sqlite and a statcache file. (Actually, two types of files. It +is possible to create snapshots using different schemes, and have them +share the same local database directory. In this case, there will still +be one localdb.sqlite file, but one statcache file for each backup +scheme.) + +Each statcache file is a plain text file, with a format similar to the +file metadata listing used in the snapshot directory. The purpose of +the statcache file is to speed the backup process by making it possible +to determine if a file has changed since the previous snapshot by +comparing the results of a stat() system call with the data in the +statcache file, and if the file is unchanged, providing the checksum and +list of data blocks used to previously store the file. The statcache +file is rewritten each time a snapshot is taken, and can safely be +deleted (with the only major side effect being that the first backups +after doing so will progress much more slowly). + +localdb.sqlite is an SQLite database file, which is used for indexing +objects stored in the snapshot directory and various other purposes. +The database schema is contained in the file schema.sql in the LBS +source. Among the data tracked by localdb.sqlite: + + - A list of segments stored in the snapshot directory. This might not + include all segments (segments belonging to old snapshots might be + removed), but for correctness all segments listed in the local + database must exist in the snapshot directory. + + - A block index which tracks objects in the snapshot directory used to + store file data. It is indexed by block checksum, and so can be + used while generating a snapshot to determine if a just-read block + of data is already stored in the snapshot directory, and if so how + to name it. + + - A list of recent snapshots, together with a list of the objects from + the block index they reference. + +The localdb SQL database is central to data sharing and segment +cleaning. When creating a new snapshot, information about the new +snapshot and the blocks is uses (including any new ones) is written to +the database. Using the database, separate segment cleaning processes +can determine how much data in various segments is still live, and +determine which segments are best candidates for cleaning. Cleaning is +performed by updating the database to mark objects in the cleaned +segments as unavailable for use in future snapshots; when the backup +process next runs, any files that would use these expired blocks instead +have a copy of the data written to a new segment. diff --git a/format.txt b/format.txt deleted file mode 100644 index 78f32c3..0000000 --- a/format.txt +++ /dev/null @@ -1,254 +0,0 @@ - Backup Format Description - for an LFS-Inspired Backup Solution - -NOTE: This format specification is not yet complete. Right now the code -provides the best documentation of the format. - -This document simply describes the snapshot format. It is described -from the point of view of a decompressor which wishes to restore the -files from a snapshot. It does not specify the exact behavior required -of the backup program writing the snapshot. - -This document does not explain the rationale behind the format; for -that, see design.txt. - - -DATA CHECKSUMS -============== - -In several places in the LBS format, a cryptographic checksum may be -used to allow data integrity to be verified. At the moment, only the -SHA-1 checksum is supported, but it is expected that other algorithms -will be supported in the future. - -When a checksum is called for, the checksum is always stored in a text -format. The general format used is - = - - identifies the checksum algorithm used, and allows new -algorithms to be added later. At the moment, the only permissible value -is "sha1", indicating a SHA-1 checksum. - - is a sequence of hexadecimal digits which encode the -checksum value. For sha1, should be precisely 40 digits -long. - -A sample checksum string is - sha1=67049e7931ad7db37b5c794d6ad146c82e5f3187 - - -SEGMENTS & OBJECTS: STORAGE AND NAMING -====================================== - -An LBS snapshot consists, at its base, of a collection of /objects/: -binary blobs of data, much like a file. Higher layers interpret the -contents of objects in various ways, but the lowest layer is simply -concerned with storing and naming these objects. - -An object is a sequence of bytes (octets) of arbitrary length. An -object may contain as few as zero bytes (though such objects are not -very useful). Object sizes are potentially unbounded, but it is -recommended that the maximum size of objects produced be on the order of -megabytes. Files of essentially unlimited size can be stored in an LBS -snapshot using objects of modest size, so this should not cause any real -restrictions. - -For storage purposes, objects are grouped together into /segments/. -Segments use the TAR format; each object within a segment is stored as a -separate file. Segments are named using UUIDs (Universally Unique -Identifiers), which are 128-bit numbers. The textual form of a UUID is -a sequence of lowercase hexadecimal digits with hyphens inserted at -fixed points; an example UUID is - a704eeae-97f2-4f30-91a4-d4473956366b -This segment could be stored in the filesystem as a file - a704eeae-97f2-4f30-91a4-d4473956366b.tar -The UUID used to name a segment is assigned when the segment is created. - -Filters can be layered on top of the segment storage to provide -compression, encryption, or other features. For example, the example -segment above might be stored as - a704eeae-97f2-4f30-91a4-d4473956366b.tar.bz2 -or - a704eeae-97f2-4f30-91a4-d4473956366b.tar.gpg -if the file data had been filtered through bzip2 or gpg, respectively, -before storage. Filtering of segment data is outside the scope of this -format specification, however; it is assumed that if filtering is used, -when decompressing the unfiltered data can be recovered (yielding data -in the TAR format). - -Objects within a segment are numbered sequentially. This sequence -number is then formatted as an 8-digit (zero-padded) hexadecimal -(lowercase) value. The fully qualified name of an object consists of -the segment name, followed by a slash ("/"), followed by the object -sequence number. So, for example - a704eeae-97f2-4f30-91a4-d4473956366b/000001ad -names an object. - -Within the segment TAR file, the filename used for each object is its -fully-qualified name. Thus, when extracted using the standard tar -utility, a segment will produce a directory with the same name as the -segment itself, and that directory will contain a set of -sequentially-numbered files each storing the contents of a single -object. - -NOTE: When naming an object, the segment portion consists of the UUID -only. Any extensions appended to the segment when storing it as a file -in the filesystem (for example, .tar.bz2) are _not_ part of the name of -the object. - -There are two additional components which may appear in an object name; -both are optional. - -First, a checksum may be added to the object name to express an -integrity constraint: the referred-to data must match the checksum -given. A checksum is enclosed in parentheses and appended to the object -name: - a704eeae-97f2-4f30-91a4-d4473956366b/000001ad(sha1=67049e7931ad7db37b5c794d6ad146c82e5f3187) - -Secondly, an object may be /sliced/: a subset of the bytes actually -stored in the object may be selected to be returned. The slice syntax -is - [+] -where is the first byte to return (as a decimal offset) and - specifies the number of bytes to return (again in decimal). It -is invalid to select using the slice syntax a range of bytes that does -not fall within the original object. The slice specification should be -appended to an object name, for example: - a704eeae-97f2-4f30-91a4-d4473956366b/000001ad[264+1000] -selects only bytes 264..1263 from the original object. - -Both a checksum and a slice can be used. In this case, the checksum is -given first, followed by the slice. The checksum is computed over the -original object contents, before slicing. - - -FILE METADATA LISTING -===================== - -A snapshot stores two distinct types of data into the object store -described above: data and metadata. Data for a file may be stored as a -single object, or the data may be broken apart into blocks which are -stored as separate objects. The file /metadata/ log (which may be -spread across multiple objects) specifies the names of the files in a -snapshot, metadata about them such as ownership and timestamps, and -gives the list of objects that contain the data for the file. - -The metadata log consists of a set of stanzas, each of which are -formatted somewhat like RFC 822 (email) headers. An example is: - - name: etc/fstab - checksum: sha1=11bd6ec140e4ec3110a91e1dd0f02b63b701421f - data: 2f46bce9-4554-4a60-a4a2-543637bd3989/000001f7 - group: 0 (root) - mode: 0644 - mtime: 1177977313 - size: 867 - type: - - user: 0 (root) - -The meanings of all the fields are described later. A blank line -separates stanzas with information about different files. In addition -to regular stanzas, the metadata listing may contain a line containing -an object reference prefixed with "@". Such a line indicates that the -contents of the referenced object should be fetched and parsed as a -metadata listing at this point, prior to continuing to parse the current -object. - -Several common encodings are used for various fields. The encoding used -for each field is specified in the field listing that follows. - encoded string: An arbitrary string (octet sequence), with bytes - optionally escaped by replacing a byte with %xx, where "xx" is a - hexadecimal representation of the byte replaced. For example, - space can be replaced with "%20". This is the same escaping - mechanism as used in URLs. - integer: An integer, which may be written in decimal, octal, or - hexadecimal. Strings starting with 0 are interpreted as octal, - and those starting with 0x are intepreted as hexadecimal. - -Common fields (required in all stanzas): - name [encoded string]: Full path of the file archived. - user [special]: The user ID of the file, as an integer, optionally - followed by a space and the corresponding username, as an - escaped string enclosed in parentheses. - group [special]: The group ID which owns the file. Encoding is the - same as for the user field: an integer, with an optional name in - parentheses following. - mode [integer]: Unix mode bits for the file. - type [special]: A single character which indicates the type of file. - The type indicators are meant to be consistent with the - characters used to indicate file type in a directory listing: - - regular file - b block device - c character device - d directory - l symlink - p pipe - s socket - mtime [integer]: Modification time of the file. - -Optional common fields: - links [integer]: Number of hard links to this file, generally only - reported if greater than 1. - inode [string]: String specifying the inode number of this file when - it was dumped. If "links" is greater than 1, then searching for - other files that have an identical "inode" value can be used to - determine which files should be hard-linked together when - restoring. The inode field should be treated as an opaque - string and compared for equality as such; an implementation may - choose whatever representation is convenient. The format - produced by the standard tool is // (where - and specify the device of the containing - filesystem and is the inode number of the file). - -Special fields used for regular files: - checksum [string]: Checksum of the file contents. - size [integer]: Size of the file, in bytes. - data [reference list]: Whitespace-separated list of object - references. The referenced data, when concatenated in the - listed order, will reconstruct the file data. Any reference - that begins with a "@" character is an indirect reference--the - given object includes a whitespace-separated list of object - references which should be parsed in the same manner as the data - field. - - -SNAPSHOT DESCRIPTOR -=================== - -The snapshot descriptor is a small file which describes a single -snapshot. It is one of the few files which is not stored as an object -in the segment store. It is stored as a separate file, in plain text, -but in the same directory as segments are stored. - -The name of snapshot descriptor file is - snapshot--.lbs - is a descriptive text which can be used to distinguish several -logically distinct sets of snapshots (such as snapshots for two -different directory trees) that are being stored in the same location. - gives the date and time the snapshot was taken; the format -is %Y%m%dT%H%M%S (20070806T092239 means 2007-08-06 09:22:39). - -The contents of the descriptor are a set of RFC 822-style headers (much -like the metadata listing). The fields which are defined are: - Format: The string "LBS Snapshot v0.2" which identifies this file as - an LBS backup descriptor. The version number (v0.2) might - change if there are changes to the format. It is expected that - at some point, once the format is stabilized, the version - identifier will be changed to v1.0. - Producer: A informative string which identifies the program that - produced the backup. - Date: The date the snapshot was produced. This matches the - timestamp encoded in the filename, but is written out in full. - A timezone is given. For example: "2007-08-06 09:22:39 -0700". - Scheme: The field from the descriptor filename. - Segments: A whitespace-seprated list of segment names. Any segment - which is referenced by this snapshot must be included in the - list, since this list can be used in garbage-collecting old - segments, determining which segments need to be downloaded to - completely reconstruct a snapshot, etc. - Root: A single object reference which points to the metadata - listing for the snapshot. - Checksums: A checksum file may be produced (with the same name as - the snapshot descriptor file, but with extension .sha1sums - instead of .lbs) containing SHA-1 checksums of all segments. - This field contains a checksum of that file. diff --git a/implementation.txt b/implementation.txt deleted file mode 100644 index 1ba78ac..0000000 --- a/implementation.txt +++ /dev/null @@ -1,91 +0,0 @@ - LBS: An LFS-Inspired Backup Solution - Implementation Overview - -HIGH-LEVEL OVERVIEW -=================== - -There are two different classes of data stored, typically in different -directories: - -The SNAPSHOT directory contains the actual backup contents. It consists -of segment data (typically in compressed/encrypted form, one segment per -file) as well as various small per-snapshot files such as the snapshot -descriptor files (which names each snapshot and tells where to locate -the data for it) and checksum files (which list checksums of segments -for quick integrity checking). The snapshot directory may be stored on -a remote server. It is write-only, in the sense that data does not need -to be read from the snapshot directory to create a new snapshot, and -files in it are immutable once created (they may be deleted if they are -no longer needed, but file contents are never changed). - -The LOCAL DATABASE contains indexes used during the backup process. -Files here keep track of what information is known to be stored in the -snapshot directory, so that new snapshots can appropriate re-use data. -The local database, as its name implies, should be stored somewhere -local, since random access (read and write) will be required during the -backup process. Unlike the snapshot directory, files here are not -immutable. - -Only the data stored in the snapshot directory is required to restore a -snapshot. The local database does not need to be backed up (stored at -multiple separate locations, etc.). The contents of the local database -can be rebuilt (at least in theory) from data in the snapshot directory -and the local filesystem; it is expected that tools will eventually be -provided to do so. - -The format of data in the snapshot directory is described in format.txt. -The format of data in the local database is more fluid and may evolve -over time. The current structure of the local database is described in -this document. - - -LOCAL DATABASE FORMAT -===================== - -The local database directory currently contains two files: -localdb.sqlite and a statcache file. (Actually, two types of files. It -is possible to create snapshots using different schemes, and have them -share the same local database directory. In this case, there will still -be one localdb.sqlite file, but one statcache file for each backup -scheme.) - -Each statcache file is a plain text file, with a format similar to the -file metadata listing used in the snapshot directory. The purpose of -the statcache file is to speed the backup process by making it possible -to determine if a file has changed since the previous snapshot by -comparing the results of a stat() system call with the data in the -statcache file, and if the file is unchanged, providing the checksum and -list of data blocks used to previously store the file. The statcache -file is rewritten each time a snapshot is taken, and can safely be -deleted (with the only major side effect being that the first backups -after doing so will progress much more slowly). - -localdb.sqlite is an SQLite database file, which is used for indexing -objects stored in the snapshot directory and various other purposes. -The database schema is contained in the file schema.sql in the LBS -source. Among the data tracked by localdb.sqlite: - - - A list of segments stored in the snapshot directory. This might not - include all segments (segments belonging to old snapshots might be - removed), but for correctness all segments listed in the local - database must exist in the snapshot directory. - - - A block index which tracks objects in the snapshot directory used to - store file data. It is indexed by block checksum, and so can be - used while generating a snapshot to determine if a just-read block - of data is already stored in the snapshot directory, and if so how - to name it. - - - A list of recent snapshots, together with a list of the objects from - the block index they reference. - -The localdb SQL database is central to data sharing and segment -cleaning. When creating a new snapshot, information about the new -snapshot and the blocks is uses (including any new ones) is written to -the database. Using the database, separate segment cleaning processes -can determine how much data in various segments is still live, and -determine which segments are best candidates for cleaning. Cleaning is -performed by updating the database to mark objects in the cleaned -segments as unavailable for use in future snapshots; when the backup -process next runs, any files that would use these expired blocks instead -have a copy of the data written to a new segment.