1 Backup Format Description
2 for an LFS-Inspired Backup Solution
4 NOTE: This is simply a proposal at this point in time, and not yet
5 implemented. Details are subject to change.
7 ========================================================================
9 Goals: To provide a stable and extensible data storage format for
10 efficient remote filesystem backups. Among the features desired in the
12 - Support for grouping unchanging file contents together, and reusing
13 it for future backups.
14 - Nonetheless allow old backups to be deleted (at least those parts
15 that are not also used by newer backups).
16 - Support some form of rdiff-style incremental differences within a
18 The current plan is to implement compression and encryption separately:
19 not as part of the base format, but simply by passing the backup data
20 through filters such as bzip2 or gpg.
22 Data is organized into a collection of _objects_, which are grouped
23 together for storage purposes into _segments_. Objects may refer to
24 other objects; a snapshot consists of a tree object which in turn refers
25 to other objects containing file data. A new snapshot may be created
26 which refers to some of the old objects with file data, if those files
29 ========================================================================
32 - Each segment is assigned a unique 128-bit identifier (uuid). Each
33 segment is stored as a separate file whose name is based on its
35 - Objects within a segment are numbered sequentially, with a 32-bit
37 Thus, each object may be referred to with a unique 160 (128 + 32) bit
41 There are two main options:
42 - Streaming format: Each object is prepended with a header, and then
43 all (header, object) pairs are concatenated. This is inspired by
44 the tar file format. Can be written out in one pass and also
45 processed when read back in one pass. Well-adapted to streaming
46 transformations, such as compression.
47 - Indexed format: Each segment contains a table giving the starting
48 position and length of each object. This is somewhat similar to
49 PDF. Data can still be written out in a single pass, but reading
50 will require random access.
52 File attributes: Metadata for each file is stored in a dictionary.
53 Dictionary keys include:
54 type: uint8_t ('p', 's', 'c', 'b', 'l', 'd', '-')