Backup Format Description
for Cumulus: Efficient Filesystem Backup to the Cloud
- Version: "LBS Snapshot v0.6"
+ Version: "Cumulus Snapshot v0.11"
NOTE: This format specification is intended to be mostly stable, but is
still subject to change before the 1.0 release. The code may provide
additional useful documentation on the format.
-NOTE2: The name of this project has changed from LBS to Cumulus.
-However, to avoid introducing gratuitous changes into the format, in
-most cases any references to "LBS" in the format description have been
-left as-is. The name may be changed in the future if the format is
-updated.
+NOTE2: The name of this project has changed from LBS to Cumulus. In
+some areas the name "LBS" is still used.
This document simply describes the snapshot format. It is described
from the point of view of a decompressor which wishes to restore the
that, see design.txt.
+BACKUP REPOSITORY LAYOUT
+========================
+
+Cumulus backups are stored using a relatively simple layout. Data files
+described below are written into one of several directories on the
+backup server, depending on their purpose:
+ snapshots/
+ Snapshot descriptor files, which quickly summarize each backup
+ snapshot stored.
+ segments0/
+ segments1/
+ Storage of the bulk of the backup data, in compressed/encrypted
+ form. Technically any segment could be stored in either
+ directory (both directories will be searched when looking for a
+ segment). However, data in segments0 might be faster to access
+ (but more expensive) depending on the storage backend. The
+ intent is that segments0 can store filesystem tree metadata and
+ segments1 can store file contents.
+ meta/
+ Snapshot-specific metadata that is not core to the backup. This
+ can include checksums of segments, some data for rebuilding
+ local database contents, etc.
+
+
DATA CHECKSUMS
==============
<algorithm>=<hexdigits>
<algorithm> identifies the checksum algorithm used, and allows new
-algorithms to be added later. At the moment, the only permissible value
-is "sha1", indicating a SHA-1 checksum.
+algorithms to be added later. Permissible values are:
+ "sha1": SHA-1
+ "sha224": SHA-224 (added in version 0.11)
+ "sha256": SHA-256 (added in version 0.11)
<hexdigits> is a sequence of hexadecimal digits which encode the
checksum value. For sha1, <hexdigits> should be precisely 40 digits
This segment could be stored in the filesystem as a file
a704eeae-97f2-4f30-91a4-d4473956366b.tar
The UUID used to name a segment is assigned when the segment is created.
+These files are stored in either the segments0 or segments1 directories
+on the backup server.
Filters can be layered on top of the segment storage to provide
compression, encryption, or other features. For example, the example
NOTE: When naming an object, the segment portion consists of the UUID
only. Any extensions appended to the segment when storing it as a file
-in the filesystem (for example, .tar.bz2) are _not_ part of the name of
-the object.
+in the filesystem (for example, .tar.bz2) and path information (for
+example, segments0) are _not_ part of the name of the object.
There are two additional components which may appear in an object name;
both are optional.
not fall within the original object. The slice specification should be
appended to an object name, for example:
a704eeae-97f2-4f30-91a4-d4473956366b/000001ad[264+1000]
-selects only bytes 264..1263 from the original object. As an
-abbreviation, the slice syntax
+selects only bytes 264..1263 from the original object.
+
+The slice syntax
[<length>]
-is shorthand for
- [0+<length>]
+indicates that all bytes of the object are to be used, but
+additionally asserts that the referenced object is exactly <length>
+bytes long. Older versions of Cumulus can also use the syntax
+ [=<length>]
+as a synonym for length assertions, but this notation is deprecated.
+
+(In older versions of the format, the syntax [<length>] was a shorthand
+for [0+<length>]: that is, select the first <length> bytes of the object
+but make no assertions about the overall size. The backup tool has not
+generated such slices since v0.8.)
Both a checksum and a slice can be used. In this case, the checksum is
given first, followed by the slice. The checksum is computed over the
the object. For example
zero[1024]
represents a block consisting of 1024 null bytes. A checksum should not
-be given. The slice syntax should use the abbreviated length-only form.
+be given.
FILE METADATA LISTING
logically distinct sets of snapshots (such as snapshots for two
different directory trees) that are being stored in the same location.
<timestamp> gives the date and time the snapshot was taken; the format
-is %Y%m%dT%H%M%S (20070806T092239 means 2007-08-06 09:22:39).
+is %Y%m%dT%H%M%S (20070806T092239 means 2007-08-06 09:22:39). It is
+recommended that the timestamp be given in UTC for consistent sorting
+even if the offset from UTC to local time changes, however the
+authoritative timestamp (including timezone) can be found in the Date
+field. (In version v0.10 and earlier the timestamp is given in local
+time; in current versions UTC is used.)
The contents of the descriptor are a set of RFC 822-style headers (much
like the metadata listing). The fields which are defined are:
- Format: The string "LBS Snapshot v0.6" which identifies this file as
- a Cumulus backup descriptor. The version number (v0.6) might
- change if there are changes to the format. It is expected that
- at some point, once the format is stabilized, the version
- identifier will be changed to v1.0.
+ Format: The string "Cumulus Snapshot v0.11" which identifies this
+ file as a Cumulus backup descriptor. The version number (v0.11)
+ might change if there are changes to the format. It is expected
+ that at some point, once the format is stabilized, the version
+ identifier will be changed to v1.0. (Earlier versions, format
+ v0.8 and earlier, used the string "LBS Snapshot" instead of
+ "Cumulus Snapshot", reflecting an earlier name for the project.
+ Consumers should be prepared for either name.)
Producer: A informative string which identifies the program that
produced the backup.
- Date: The date the snapshot was produced. This matches the
- timestamp encoded in the filename, but is written out in full.
- A timezone is given. For example: "2007-08-06 09:22:39 -0700".
+ Date: The date the snapshot was produced, in the local time zone.
+ This matches the timestamp encoded in the filename, but is
+ written out in full. A timezone (offset from UTC) is given.
+ For example: "2007-08-06 02:22:39 -0700".
Scheme: The <scheme> field from the descriptor filename.
Segments: A whitespace-seprated list of segment names. Any segment
which is referenced by this snapshot must be included in the