Backup Format Description for an LFS-Inspired Backup Solution NOTE: This is simply a proposal at this point in time, and not yet implemented. Details are subject to change. ======================================================================== Goals: To provide a stable and extensible data storage format for efficient remote filesystem backups. Among the features desired in the format are: - Support for grouping unchanging file contents together, and reusing it for future backups. - Nonetheless allow old backups to be deleted (at least those parts that are not also used by newer backups). - Support some form of rdiff-style incremental differences within a file. The current plan is to implement compression and encryption separately: not as part of the base format, but simply by passing the backup data through filters such as bzip2 or gpg. Data is organized into a collection of _objects_, which are grouped together for storage purposes into _segments_. Objects may refer to other objects; a snapshot consists of a tree object which in turn refers to other objects containing file data. A new snapshot may be created which refers to some of the old objects with file data, if those files have not changed. ======================================================================== Object naming: - Each segment is assigned a unique 128-bit identifier (uuid). Each segment is stored as a separate file whose name is based on its uuid. - Objects within a segment are numbered, using a 32-bit counter. Each segment is structured as a TAR file (optionally filtered through a compressor such as gzip/bzip2, or encrypted). Objects are stored as individual files. File attributes: Metadata for each file is stored in a dictionary. Dictionary keys include: type: uint8_t ('p', 's', 'c', 'b', 'l', 'd', '-') mode: uint16_t user: uint32_t group: uint32_t size: int64_t atime: int64_t mtime: int64_t ctime: int64_t