From: Michael Vrable Date: Wed, 20 Dec 2006 02:55:15 +0000 (-0800) Subject: Fill in a couple more details about a proposed file format. X-Git-Url: https://git.vrable.net/?a=commitdiff_plain;h=01a292581690054eef19558f564bbb56e4ca955e;p=cumulus.git Fill in a couple more details about a proposed file format. --- diff --git a/format.txt b/format.txt index f7ce166..7bf0474 100644 --- a/format.txt +++ b/format.txt @@ -25,3 +25,37 @@ other objects; a snapshot consists of a tree object which in turn refers to other objects containing file data. A new snapshot may be created which refers to some of the old objects with file data, if those files have not changed. + +======================================================================== + +Object naming: + - Each segment is assigned a unique 128-bit identifier (uuid). Each + segment is stored as a separate file whose name is based on its + uuid. + - Objects within a segment are numbered sequentially, with a 32-bit + counter. +Thus, each object may be referred to with a unique 160 (128 + 32) bit +identifier. + +Segment structure: +There are two main options: + - Streaming format: Each object is prepended with a header, and then + all (header, object) pairs are concatenated. This is inspired by + the tar file format. Can be written out in one pass and also + processed when read back in one pass. Well-adapted to streaming + transformations, such as compression. + - Indexed format: Each segment contains a table giving the starting + position and length of each object. This is somewhat similar to + PDF. Data can still be written out in a single pass, but reading + will require random access. + +File attributes: Metadata for each file is stored in a dictionary. +Dictionary keys include: + type: uint8_t ('p', 's', 'c', 'b', 'l', 'd', '-') + mode: uint16_t + user: uint32_t + group: uint32_t + size: int64_t + atime: int64_t + mtime: int64_t + ctime: int64_t