1 Backup Format Description
2 for Cumulus: Efficient Filesystem Backup to the Cloud
3 Version: "Cumulus Snapshot v0.11"
5 NOTE: This format specification is intended to be mostly stable, but is
6 still subject to change before the 1.0 release. The code may provide
7 additional useful documentation on the format.
9 NOTE2: The name of this project has changed from LBS to Cumulus. In
10 some areas the name "LBS" is still used.
12 This document simply describes the snapshot format. It is described
13 from the point of view of a decompressor which wishes to restore the
14 files from a snapshot. It does not specify the exact behavior required
15 of the backup program writing the snapshot. For details of the current
16 backup program, see implementation.txt.
18 This document does not explain the rationale behind the format; for
22 BACKUP REPOSITORY LAYOUT
23 ========================
25 Cumulus backups are stored using a relatively simple layout. Data files
26 described below are written into one of several directories on the
27 backup server, depending on their purpose:
29 Snapshot descriptor files, which quickly summarize each backup
33 Storage of the bulk of the backup data, in compressed/encrypted
34 form. Technically any segment could be stored in either
35 directory (both directories will be searched when looking for a
36 segment). However, data in segments0 might be faster to access
37 (but more expensive) depending on the storage backend. The
38 intent is that segments0 can store filesystem tree metadata and
39 segments1 can store file contents.
41 Snapshot-specific metadata that is not core to the backup. This
42 can include checksums of segments, some data for rebuilding
43 local database contents, etc.
49 In several places in the Cumulus format, a cryptographic checksum may be
50 used to allow data integrity to be verified. At the moment, only the
51 SHA-1 checksum is supported, but it is expected that other algorithms
52 will be supported in the future.
54 When a checksum is called for, the checksum is always stored in a text
55 format. The general format used is
56 <algorithm>=<hexdigits>
58 <algorithm> identifies the checksum algorithm used, and allows new
59 algorithms to be added later. Permissible values are:
61 "sha224": SHA-224 (added in version 0.11)
62 "sha256": SHA-256 (added in version 0.11)
64 <hexdigits> is a sequence of hexadecimal digits which encode the
65 checksum value. For sha1, <hexdigits> should be precisely 40 digits
68 A sample checksum string is
69 sha1=67049e7931ad7db37b5c794d6ad146c82e5f3187
72 SEGMENTS & OBJECTS: STORAGE AND NAMING
73 ======================================
75 A Cumulus snapshot consists, at its base, of a collection of /objects/:
76 binary blobs of data, much like a file. Higher layers interpret the
77 contents of objects in various ways, but the lowest layer is simply
78 concerned with storing and naming these objects.
80 An object is a sequence of bytes (octets) of arbitrary length. An
81 object may contain as few as zero bytes (though such objects are not
82 very useful). Object sizes are potentially unbounded, but it is
83 recommended that the maximum size of objects produced be on the order of
84 megabytes. Files of essentially unlimited size can be stored in a
85 Cumulus snapshot using objects of modest size, so this should not cause
86 any real restrictions.
88 For storage purposes, objects are grouped together into /segments/.
89 Segments use the TAR format; each object within a segment is stored as a
90 separate file. Segments are named using UUIDs (Universally Unique
91 Identifiers), which are 128-bit numbers. The textual form of a UUID is
92 a sequence of lowercase hexadecimal digits with hyphens inserted at
93 fixed points; an example UUID is
94 a704eeae-97f2-4f30-91a4-d4473956366b
95 This segment could be stored in the filesystem as a file
96 a704eeae-97f2-4f30-91a4-d4473956366b.tar
97 The UUID used to name a segment is assigned when the segment is created.
98 These files are stored in either the segments0 or segments1 directories
101 Filters can be layered on top of the segment storage to provide
102 compression, encryption, or other features. For example, the example
103 segment above might be stored as
104 a704eeae-97f2-4f30-91a4-d4473956366b.tar.bz2
106 a704eeae-97f2-4f30-91a4-d4473956366b.tar.gpg
107 if the file data had been filtered through bzip2 or gpg, respectively,
108 before storage. Filtering of segment data is outside the scope of this
109 format specification, however; it is assumed that if filtering is used,
110 when decompressing the unfiltered data can be recovered (yielding data
113 Objects within a segment are numbered sequentially. This sequence
114 number is then formatted as an 8-digit (zero-padded) hexadecimal
115 (lowercase) value. The fully qualified name of an object consists of
116 the segment name, followed by a slash ("/"), followed by the object
117 sequence number. So, for example
118 a704eeae-97f2-4f30-91a4-d4473956366b/000001ad
121 Within the segment TAR file, the filename used for each object is its
122 fully-qualified name. Thus, when extracted using the standard tar
123 utility, a segment will produce a directory with the same name as the
124 segment itself, and that directory will contain a set of
125 sequentially-numbered files each storing the contents of a single
128 NOTE: When naming an object, the segment portion consists of the UUID
129 only. Any extensions appended to the segment when storing it as a file
130 in the filesystem (for example, .tar.bz2) and path information (for
131 example, segments0) are _not_ part of the name of the object.
133 There are two additional components which may appear in an object name;
136 First, a checksum may be added to the object name to express an
137 integrity constraint: the referred-to data must match the checksum
138 given. A checksum is enclosed in parentheses and appended to the object
140 a704eeae-97f2-4f30-91a4-d4473956366b/000001ad(sha1=67049e7931ad7db37b5c794d6ad146c82e5f3187)
142 Secondly, an object may be /sliced/: a subset of the bytes actually
143 stored in the object may be selected to be returned. The slice syntax
146 where <start> is the first byte to return (as a decimal offset) and
147 <length> specifies the number of bytes to return (again in decimal). It
148 is invalid to select using the slice syntax a range of bytes that does
149 not fall within the original object. The slice specification should be
150 appended to an object name, for example:
151 a704eeae-97f2-4f30-91a4-d4473956366b/000001ad[264+1000]
152 selects only bytes 264..1263 from the original object. As an
153 abbreviation, the slice syntax
157 In place of a traditional slice, the annotation
159 may be used. This is somewhat similar to specifying [<length>], but
160 additionally asserts that the referenced object is exactly <length>
161 bytes long--that is, this slice syntax does not change the bytes
162 returned at all, but can be used to provide information about the
163 underlying object store.
165 Both a checksum and a slice can be used. In this case, the checksum is
166 given first, followed by the slice. The checksum is computed over the
167 original object contents, before slicing.
172 In addition to the standard syntax for objects described above, the
173 special name "zero" may be used instead of segment/sequence number.
174 This represents an object consisting entirely of zeroes. The zero
175 object must have a slice specification appended to indicate the size of
176 the object. For example
178 represents a block consisting of 1024 null bytes. A checksum should not
179 be given. The slice syntax should use the abbreviated length-only form.
182 FILE METADATA LISTING
183 =====================
185 A snapshot stores two distinct types of data into the object store
186 described above: data and metadata. Data for a file may be stored as a
187 single object, or the data may be broken apart into blocks which are
188 stored as separate objects. The file /metadata/ log (which may be
189 spread across multiple objects) specifies the names of the files in a
190 snapshot, metadata about them such as ownership and timestamps, and
191 gives the list of objects that contain the data for the file.
193 The metadata log consists of a set of stanzas, each of which are
194 formatted somewhat like RFC 822 (email) headers. An example is:
197 checksum: sha1=11bd6ec140e4ec3110a91e1dd0f02b63b701421f
198 data: 2f46bce9-4554-4a60-a4a2-543637bd3989/000001f7
206 The meanings of all the fields are described later. A blank line
207 separates stanzas with information about different files. In addition
208 to regular stanzas, the metadata listing may contain a line containing
209 an object reference prefixed with "@". Such a line indicates that the
210 contents of the referenced object should be fetched and parsed as a
211 metadata listing at this point, prior to continuing to parse the current
214 Several common encodings are used for various fields. The encoding used
215 for each field is specified in the field listing that follows.
216 encoded string: An arbitrary string (octet sequence), with bytes
217 optionally escaped by replacing a byte with %xx, where "xx" is a
218 hexadecimal representation of the byte replaced. For example,
219 space can be replaced with "%20". This is the same escaping
220 mechanism as used in URLs.
221 integer: An integer, which may be written in decimal, octal, or
222 hexadecimal. Strings starting with 0 are interpreted as octal,
223 and those starting with 0x are intepreted as hexadecimal.
225 Common fields (required in all stanzas):
226 path [encoded string]: Full path of the file archived. Note: In
227 previous versions (<= 0.2) the name of this field was "name".
228 user [special]: The user ID of the file, as an integer, optionally
229 followed by a space and the corresponding username, as an
230 escaped string enclosed in parentheses.
231 group [special]: The group ID which owns the file. Encoding is the
232 same as for the user field: an integer, with an optional name in
233 parentheses following.
234 mode [integer]: Unix mode bits for the file.
235 type [special]: A single character which indicates the type of file.
236 The type indicators are meant to be consistent with the
237 characters used with the -type option to find(1), and the file
238 type checks in test(1):
246 Note that previous versions used '-' to indicate a regular file.
247 This character should not be generated in any new snapshots, but
248 may be encountered in old snapshots (those with a format version
250 mtime [integer]: Modification time of the file.
252 Optional common fields:
253 links [integer]: Number of hard links to this file, generally only
254 reported if greater than 1.
255 inode [string]: String specifying the inode number of this file when
256 it was dumped. If "links" is greater than 1, then searching for
257 other files that have an identical "inode" value can be used to
258 determine which files should be hard-linked together when
259 restoring. The inode field should be treated as an opaque
260 string and compared for equality as such; an implementation may
261 choose whatever representation is convenient. The format
262 produced by the standard tool is <major>/<minor>/<inode> (where
263 <major> and <minor> specify the device of the containing
264 filesystem and <inode> is the inode number of the file).
265 ctime [integer]: Change time for the inode.
267 Special fields used for regular files:
268 checksum [string]: Checksum of the file contents.
269 size [integer]: Size of the file, in bytes.
270 data [reference list]: Whitespace-separated list of object
271 references. The referenced data, when concatenated in the
272 listed order, will reconstruct the file data. Any reference
273 that begins with a "@" character is an indirect reference--the
274 given object includes a whitespace-separated list of object
275 references which should be parsed in the same manner as the data
278 Special fields used for symbolic links:
279 target[encoded string]: The target of the symlink, as returned by
280 readlink(2). Note: In old version of the format (<= 0.2), this
281 field was called "contents" instead of "target".
283 Special fields used for block and character device files:
284 device[special]: The major and minor number of the device. Encoded
285 as "major/minor", where major is the major device number encoded
286 into an integer, and minor is the minor device number.
292 The snapshot descriptor is a small file which describes a single
293 snapshot. It is one of the few files which is not stored as an object
294 in the segment store. It is stored as a separate file, in plain text,
295 but in the same directory as segments are stored.
297 The name of snapshot descriptor file is
298 snapshot-<scheme>-<timestamp>.lbs
299 <scheme> is a descriptive text which can be used to distinguish several
300 logically distinct sets of snapshots (such as snapshots for two
301 different directory trees) that are being stored in the same location.
302 <timestamp> gives the date and time the snapshot was taken; the format
303 is %Y%m%dT%H%M%S (20070806T092239 means 2007-08-06 09:22:39). It is
304 recommended that the timestamp be given in UTC for consistent sorting
305 even if the offset from UTC to local time changes, however the
306 authoritative timestamp (including timezone) can be found in the Date
307 field. (In version v0.10 and earlier the timestamp is given in local
308 time; in current versions UTC is used.)
310 The contents of the descriptor are a set of RFC 822-style headers (much
311 like the metadata listing). The fields which are defined are:
312 Format: The string "Cumulus Snapshot v0.11" which identifies this
313 file as a Cumulus backup descriptor. The version number (v0.11)
314 might change if there are changes to the format. It is expected
315 that at some point, once the format is stabilized, the version
316 identifier will be changed to v1.0. (Earlier versions, format
317 v0.8 and earlier, used the string "LBS Snapshot" instead of
318 "Cumulus Snapshot", reflecting an earlier name for the project.
319 Consumers should be prepared for either name.)
320 Producer: A informative string which identifies the program that
322 Date: The date the snapshot was produced, in the local time zone.
323 This matches the timestamp encoded in the filename, but is
324 written out in full. A timezone (offset from UTC) is given.
325 For example: "2007-08-06 02:22:39 -0700".
326 Scheme: The <scheme> field from the descriptor filename.
327 Segments: A whitespace-seprated list of segment names. Any segment
328 which is referenced by this snapshot must be included in the
329 list, since this list can be used in garbage-collecting old
330 segments, determining which segments need to be downloaded to
331 completely reconstruct a snapshot, etc.
332 Root: A single object reference which points to the metadata
333 listing for the snapshot.
334 Checksums: A checksum file may be produced (with the same name as
335 the snapshot descriptor file, but with extension .sha1sums
336 instead of .lbs) containing SHA-1 checksums of all segments.
337 This field contains a checksum of that file.
338 Intent: Informational; records the value of the --intent flag when
339 the snapshot was created, and can be used when determining which
340 snapshots to later delete.