Michael Vrable [Mon, 20 Jul 2009 01:02:38 +0000 (18:02 -0700)]
Update README with information about remote storage.
Michael Vrable [Tue, 30 Jun 2009 18:25:30 +0000 (11:25 -0700)]
Include a missing header file.
This should fix compilation with newer versions of GCC. Problem originally
reported by Robert Rebstock <rebstock@scienceworks.com>.
Michael Vrable [Sun, 31 May 2009 06:21:10 +0000 (23:21 -0700)]
Implement rudimentary garbage collection.
Implement a garbage collection method in cumulus-util which will search for
files not referenced by any current snapshots and delete them. This still
doesn't let snapshots themselves be deleted automatically, but after
manually deleting a snapshot this will quickly delete all other old files.
Michael Vrable [Sun, 31 May 2009 06:19:10 +0000 (23:19 -0700)]
Implement metadata caching for S3 backend.
Amazon S3 will return some limited object metadata when a list operation is
performed. This is significantly cheaper than fetching the information for
objects one at a time. In the S3 backend, implement a scan() method that
will list all objects and cache the metadata, then return cached results
when stat() is called.
Michael Vrable [Thu, 26 Mar 2009 21:35:27 +0000 (14:35 -0700)]
lbs-filter-gpg has been renamed to cumulus-filter-gpg.
An instance of this was missed in the code. Caught by Achim J. Latz
<achim.latz@qustodium.net>.
Michael Vrable [Wed, 14 Jan 2009 22:06:31 +0000 (14:06 -0800)]
Include segment add/remove counts in list-snapshot-sizes.
Michael Vrable [Mon, 15 Dec 2008 23:15:19 +0000 (15:15 -0800)]
Detect decompression script needed for segments based on extension.
Michael Vrable [Mon, 15 Dec 2008 22:29:13 +0000 (14:29 -0800)]
cumulus-sync: Create a tool for copying snapshots between locations.
This will automatically find and copy all needed segments that are not
already present, and can handle both local filesystems and remote storage
with Amazon S3.
Michael Vrable [Thu, 20 Nov 2008 23:47:52 +0000 (15:47 -0800)]
Drop the use of exceptions for fatal error handling.
Michael Vrable [Thu, 20 Nov 2008 23:38:39 +0000 (15:38 -0800)]
Improve handling of file-not-found in remote storage layer.
Implement a single NotFoundError which is thrown whenever a file does not
exist in any remote store, instead instance-specific error handling.
Michael Vrable [Thu, 20 Nov 2008 20:11:41 +0000 (12:11 -0800)]
Delete contrib/cumulus-store-s3.
This has been replaced by cumulus-store, and the main cumulus executable
now uses a different interface so cumulus-store-s3 won't work any longer.
Michael Vrable [Thu, 20 Nov 2008 20:10:06 +0000 (12:10 -0800)]
Re-do cumulus side of upload script interface.
Update the cumulus executable so that the interface for the remote upload
script is compatible with the new cumulus-store script, allowing cumulus to
easily target different storage backends.
Michael Vrable [Thu, 20 Nov 2008 19:55:13 +0000 (11:55 -0800)]
Introduce a script to provide access to remote repositories.
cumulus-store is a Python script that uses the cumulus library code to
access various storage repositories (local file, S3, and extensible to
others). It allows non-Python code to access these storage repositories
through a simple interface through stdin/stdout.
Additionally, make a few extensions and fixes to the cumulus Python
libraries.
Michael Vrable [Thu, 13 Nov 2008 22:38:44 +0000 (14:38 -0800)]
Makefile: Fix clean target.
Michael Vrable [Thu, 6 Nov 2008 18:47:56 +0000 (10:47 -0800)]
cumulus-util: Automatically set Python search path.
Attempt to set the Python library search path so that cumulus-util can find
the cumulus Python modules, without PYTHON_PATH having to be set
explicitly.
Modules are looked for in the "python" directory where the cumulus-util
binary resides; this is appropriate for running cumulus-util directly from
the source code directory, but may not be if the tools are installed
somewhere else.
Michael Vrable [Thu, 6 Nov 2008 18:47:18 +0000 (10:47 -0800)]
cumulus-util: In list-snapshot-sizes output, display backup intent values.
Michael Vrable [Wed, 17 Sep 2008 22:37:40 +0000 (15:37 -0700)]
Extend FUSE functionality.
Add caching of metadata for performance, and reading of file data.
Michael Vrable [Tue, 12 Aug 2008 22:14:35 +0000 (15:14 -0700)]
Start a proof-of-concept FUSE interface to old snapshots.
Begin work on a FUSE interface to cumulus, allowing old snapshots to
displayed as a mounted filesystem. Though only partly-implemented, already
it is possible to read the directory structure and stat information for
files. File contents cannot yet be extracted.
To implement this efficiently, random access to cumulus metadata was
implemented through the use of a binary search. Some optimization is still
needed, and some caching should probably still be added.
Michael Vrable [Mon, 11 Aug 2008 21:38:30 +0000 (14:38 -0700)]
Allow a URL to be used in cumulus-util to specify a store location.
Both file:/// and s3:// URLs are supported, reading data from the local
filesystem or Amazon S3, respectively. Local paths (without a file:///
prefix) can still be specified.
Michael Vrable [Wed, 6 Aug 2008 19:11:22 +0000 (12:11 -0700)]
Begin new storage-abstraction layer.
Begin work on new Python code for providing uniform access to both local
filesystem and remote S3 storage. Convert the existing Python module to
use the new interface.
Michael Vrable [Fri, 1 Aug 2008 18:37:35 +0000 (11:37 -0700)]
Makefile cleanup.
Michael Vrable [Fri, 1 Aug 2008 18:30:35 +0000 (11:30 -0700)]
Prepare for 0.8 release.
Michael Vrable [Wed, 30 Jul 2008 20:06:54 +0000 (13:06 -0700)]
Fix typo in restore.pl.
Michael Vrable [Wed, 30 Jul 2008 18:48:45 +0000 (11:48 -0700)]
Update restore.pl for new snapshot format (v0.8).
Michael Vrable [Tue, 15 Jul 2008 23:27:19 +0000 (16:27 -0700)]
Rebuild sub-block signatures when --rebuild-statcache is specified.
Michael Vrable [Tue, 15 Jul 2008 23:26:50 +0000 (16:26 -0700)]
Documentation updates.
Michael Vrable [Mon, 14 Jul 2008 23:19:58 +0000 (16:19 -0700)]
Update help text to include --rebuild-statcache.
Michael Vrable [Mon, 14 Jul 2008 22:59:21 +0000 (15:59 -0700)]
--rebuild-statcache bugfix.
Check that a file is actually unchanged (and not just present in the old
statcache with different metadata) before printing a warning about
differing checksums.
Michael Vrable [Mon, 14 Jul 2008 22:49:18 +0000 (15:49 -0700)]
Add the --rebuild-statcache option.
Add an option which forces cumulus to re-read all files, even if they seem
not to have changed.
This has two main purposes: first, it helps to rebuild the contents of the
statcache from data actually on disk, and as such is useful for adding
object size annotations to files which haven't changed (ordinarily, the
straight re-use of data from the statcache prevents this).
Second, it can be used to detect some simple forms of data corruption; if a
file has not changed (according to stat information) but the checksum
doesn't match the old value in the statcache file, a warning is printed.
Michael Vrable [Mon, 14 Jul 2008 21:17:57 +0000 (14:17 -0700)]
Better cope with null values in the segments_used table.
Treat nulls utilization values as 0.0 for cleaning purposes. These
shouldn't come up, but may have been generated due to bugs in the SQLite
library, so deal gracefully with them instead of failing with an exception.
Michael Vrable [Thu, 10 Jul 2008 18:05:19 +0000 (11:05 -0700)]
Make use of size assertions in references where possible.
When writing out a reference to what is known to be a complete object, use
the size-assertion form of a reference.
Michael Vrable [Tue, 1 Jul 2008 04:35:13 +0000 (21:35 -0700)]
Eliminate a gcc warning.
Add parentheses around arithmetic near a shift operator, to get rid of a
gcc warning. The code was correct before, and this change causes it to
diverge from the LBFS sources it was derived from, but it is worthwhile to
get rid of the gcc warning.
Michael Vrable [Mon, 30 Jun 2008 21:17:08 +0000 (14:17 -0700)]
Extend object reference syntax with size assertions.
Object references can now include a size assertion, such as [=1024]
which indicates that the referenced object is exactly 1024 bytes in
length. If a metadata log or statcache file is produced using this
reference form where appropriate, then it should be possible to rebuild
much of the object index in the local database (by looking for files
which are unchanged and computing hashes of blocks from that file where
it is known that an entire object was used, not just a fragment of an
object).
This commit merely adds support for parsing the new references; they are
not yet generated by any code.
Michael Vrable [Mon, 30 Jun 2008 20:28:12 +0000 (13:28 -0700)]
Add some missing #include statements.
Some standard include files (stdlib.h and string.h) were not being included
where necessary. This worked in the past, but seems to break for some
versions of the compiler/standard library, so be sure to include what we
need.
Michael Vrable [Mon, 23 Jun 2008 23:12:01 +0000 (16:12 -0700)]
README corrections.
Michael Vrable [Mon, 23 Jun 2008 21:08:02 +0000 (14:08 -0700)]
Document the "-v/--verbose" option.
Michael Vrable [Mon, 23 Jun 2008 20:49:22 +0000 (13:49 -0700)]
NEWS updates for v0.7 release.
Michael Vrable [Mon, 16 Jun 2008 22:25:22 +0000 (15:25 -0700)]
Add a verbose option to cumulus.
By default, do not output a listing of all files as they are backed up. If
--verbose or -v is specified, then do so.
Michael Vrable [Fri, 13 Jun 2008 16:25:00 +0000 (09:25 -0700)]
README updates: explain restores in more detail.
Michael Vrable [Thu, 12 Jun 2008 20:53:25 +0000 (13:53 -0700)]
Compute checksum of checksums file while it still exists.
When data is being stored remotely, the checksums file (containing hashes
for all the segments needed by a snapshot) may only be stored locally for a
short period of time. We can't wait to compute its checksum (for inclusion
into the root descriptor) until the time when the root descriptor is
writtenout, since the checksum file may be gone by then.
Compute the checksum earlier (before sending the checksums file to remote
storage), and save the checksum value until it is written out later.
Michael Vrable [Thu, 12 Jun 2008 20:49:50 +0000 (13:49 -0700)]
Fix an example command in the README.
Michael Vrable [Mon, 9 Jun 2008 18:01:25 +0000 (11:01 -0700)]
Updates to documentation and contributed scripts for name change.
Michael Vrable [Mon, 9 Jun 2008 17:25:51 +0000 (10:25 -0700)]
Do not store subfile signatures for very short blocks.
Don't bother to store subfile signatures for very short files, since it is
probably not worth the effort: we're probably best off just storing a new
copy of the data if it changes anyway.
For now, we used a fixed and non-scientific threshold of 16 kB as the
minimum size before we'll save subfile signatures. However, short blocks
that are created to store any new chunks needed for a subfile incremental
are always indexed (since we are likely to want to use the same chunks
again in the next backup).
Michael Vrable [Wed, 4 Jun 2008 00:00:34 +0000 (17:00 -0700)]
.gitignore update.
Michael Vrable [Tue, 3 Jun 2008 21:50:11 +0000 (14:50 -0700)]
Update name of lbs-util program.
Michael Vrable [Tue, 3 Jun 2008 21:44:32 +0000 (14:44 -0700)]
Change name of project to Cumulus.
Start changing some references to the LBS name to Cumulus instead. So far,
changes are in a few of the more user-visible places (name of the
executable, name in the version string). Many internal references are not
changed, and likely will not be changed immediately (since some of the
changes would change format compatibility).
Michael Vrable [Mon, 2 Jun 2008 21:11:53 +0000 (14:11 -0700)]
Hypens are allowed as key names in RFC822-style data.
Update the parsers for the RFC822-style key-value lists. Allow a hyphen in
key names. Previously, the "Backup-Intent" field was being ignored because
its name was considered invalid; this change should fix that.
Michael Vrable [Mon, 2 Jun 2008 20:48:36 +0000 (13:48 -0700)]
Store unspecified scheme names in database as empty string, not null.
Avoid the use of nulls to represent an unspecified backup scheme in the
local database. This should fix the bug where database cleaning would not
touch backups without a scheme name.
Michael Vrable [Sat, 31 May 2008 05:31:13 +0000 (22:31 -0700)]
Delete obsolete sub-block signatures when garbage collecting.
Michael Vrable [Sat, 31 May 2008 00:29:13 +0000 (17:29 -0700)]
Update NEWS file the 0.7 release.
Note the addition of the sub-file incrementals feature, and include a
script for upgrading the local database to the new format.
Michael Vrable [Sat, 31 May 2008 00:22:08 +0000 (17:22 -0700)]
Fix a bug in signature loading for sub-file incrementals.
Signatures for sub-file chunks were being written properly, but were not
being loaded properly (signature data and the algorithm string were
being swapped), so no sub-file incrementals were being generated.
Michael Vrable [Fri, 30 May 2008 22:13:56 +0000 (15:13 -0700)]
Track quantity of data referenced in old segments more precisely.
When calculating used space in segments, treat a reference to just a subset
of data in an object (for example, with subfile incrementals) as using just
that data, instead of the entire object. This should provide better
information to the segment cleaner.
Michael Vrable [Fri, 30 May 2008 22:12:12 +0000 (15:12 -0700)]
Initial support for efficient sub-file incrementals.
This is a cleaned-up version of the code written for the OSDI'08
submission to implement sub-file incremental deltas. Large files are
broken into small chunks (in a content-sensitive manner with Rabin
fingerprints, as in LBFS). Hash values are computed for the chunks and
stored in a new database table in the local database. On subsequent
backups, when a file has changed, search for chunks that are identical,
so that portions of old blocks may be re-used even if the entire blocks
cannot be.
Michael Vrable [Fri, 30 May 2008 21:55:29 +0000 (14:55 -0700)]
Remove some debugging output so backup runs are less verbose.
Michael Vrable [Tue, 13 May 2008 16:41:41 +0000 (09:41 -0700)]
Put updated copyright statements in all source files.
These now reflect the fact that all code should be distributable under the
GPLv2.
Michael Vrable [Mon, 12 May 2008 19:57:49 +0000 (12:57 -0700)]
Report compressed size of data written in a backup as well as uncompressed.
When exiting, a summary of the size of all segments (grouped by type: data,
metadata, ...) is printed. Extend this so that both the uncompressed and
compressed sizes of the segments are printed, and to do so now also keep
track of the compressed size of data.
Michael Vrable [Fri, 11 Apr 2008 01:17:48 +0000 (18:17 -0700)]
Update copyright dates in source files.
Michael Vrable [Fri, 11 Apr 2008 01:10:03 +0000 (18:10 -0700)]
Squeeze extra blank lines when dumping metadata logs.
In lbs-util, squeeze out extra blank lines in the output of the
read-metadata command. Extra blank lines may appear in the input,
particularly when delta-encoding metadata logs, but to produce
uniform-looking outputs, delete these extra blank lines.
Michael Vrable [Wed, 9 Apr 2008 21:54:06 +0000 (14:54 -0700)]
NEWS updates.
Michael Vrable [Wed, 9 Apr 2008 21:04:44 +0000 (14:04 -0700)]
Update arguments passed to upload script.
Fix the remote upload script implementation in lbs so that it matches the
conventions expected by the S3 sample script.
Michael Vrable [Wed, 9 Apr 2008 18:26:03 +0000 (11:26 -0700)]
Implement a simple backend script to store data to Amazon S3.
This currently doesn't quite use the interface expected by lbs. The
interfaces will be matched soon.
Michael Vrable [Thu, 3 Apr 2008 20:07:11 +0000 (13:07 -0700)]
Preliminary support for external file upload scripts.
This adds initial support for calling out to an external script to transfer
files to a backup server. Storage requirements on the client using this
are minimal: space for the local database and for spooling several files
for upload. Local temporary files are deleted as they are uploaded, and
the backup rate is throttled to the upload rate.
Michael Vrable [Wed, 2 Apr 2008 03:58:27 +0000 (20:58 -0700)]
Initial framework for direct transfer of backups to remote storage.
Add a layer of indirection in the writing of files to the backup store, and
create a background thread to handle the processing of files to be stored.
Right now this secondary thread does not do much, but will easily be able
to launch a helper script for transferring data to a remote server.
Files are processed by the background thread one at a time. Multiple files
can be queued up for processing, but the size of the queue is limited so
that the production of backup data will be throttled to the speed at which
the data can be transferred (to bound the temporary space needed for
storing files).
Michael Vrable [Sat, 1 Mar 2008 00:08:35 +0000 (16:08 -0800)]
Make restoring from snapshots more efficient.
When restoring a snapshot, restore files in order roughly determined by how
they are stored in segments, instead of in pure lexicographic order. This
should ensure that, for the most part, each segment only has to be unpacked
once, instead of perhaps many times as could happen previously, and so
should make restoring more efficient.
This implementation loads all metadata into memory to determine the
ordering, and so restores are now much more memory-intensive than before.
It would be good to work on memory requirements later--either offer an
option to use the old behavior, or perhaps load some of the data into a
temporary database.
Michael Vrable [Thu, 28 Feb 2008 00:20:31 +0000 (16:20 -0800)]
Allow restores of just selected files/directories.
Previously, only a complete snapshot could be restored. This change to
lbs-util will allow just selected data to be restored.
Michael Vrable [Tue, 19 Feb 2008 22:14:57 +0000 (14:14 -0800)]
Documentation updates.
Michael Vrable [Tue, 19 Feb 2008 18:51:37 +0000 (10:51 -0800)]
Add GPLv2 license conditions.
Since some code is derived from GPL-covered software, allow the entire
program to be distributed under the terms of the GPL, version 2.
Michael Vrable [Thu, 14 Feb 2008 22:48:04 +0000 (14:48 -0800)]
Minor documentation updates.
Michael Vrable [Wed, 13 Feb 2008 22:27:52 +0000 (14:27 -0800)]
Do not attempt to clean the same segment multiple times.
Michael Vrable [Wed, 13 Feb 2008 01:05:33 +0000 (17:05 -0800)]
Slight tweaks to the local database to improve cleaning procedures.
In addition to marking objects in cleaned segments, mark the segment itself
as cleaned.
Michael Vrable [Thu, 17 Jan 2008 04:19:14 +0000 (20:19 -0800)]
Include snapshot intent value in the backup descriptor.
It wasn't included earlier, but could be useful to have when actually going
back to clean out old snapshots at a later point in time.
Michael Vrable [Tue, 15 Jan 2008 21:57:55 +0000 (13:57 -0800)]
Documentation updates.
Michael Vrable [Tue, 15 Jan 2008 18:48:30 +0000 (10:48 -0800)]
Fix to segment age calculation in local database.
It seems that in SQLite, max(x, NULL) yields NULL, not x. This was being
used to set the mtime of a segment to the maximum mtime of any object in
it, starting with an mtime of NULL. Fix the computation so it does the
right thing.
Michael Vrable [Wed, 9 Jan 2008 22:26:07 +0000 (14:26 -0800)]
Extend tracking of used segments to cover metadata segments.
In the segments_used table in the local database, include segments that
contain metadata in addition to data segments. Additionally, slightly
extend the segment tracking code so that the modification time of segments
is written out.
The exact utilization of the metadata segments is not yet computed; for now
the utilization is listed as 1.0 even if it is actually less.
Michael Vrable [Wed, 9 Jan 2008 21:20:50 +0000 (13:20 -0800)]
Minor fix to segment cleaning.
Previously, all objects were marked to be rewritten, instead of merely
those in segments marked for cleaning. Properly handle this.
Michael Vrable [Tue, 8 Jan 2008 19:50:49 +0000 (11:50 -0800)]
Add a flag to force a full rewrite of the metadata log in a snapshot.
When --full-metadata is given, no pointers to old metadata will be written
out. This could be used periodically in backups (say, weekly) to prevent
long dependencies in the metadata logs, at least until better cleaning is
implemented.
Michael Vrable [Tue, 25 Dec 2007 04:00:42 +0000 (20:00 -0800)]
Add intent-based cleaning to lbs-util.
Allow the level of segment cleaning performed to be adjusted by specifying
the next type of backup to be performed. If the next backup is to be
longer-lived, then clean more aggressively.
Michael Vrable [Fri, 14 Dec 2007 19:51:41 +0000 (11:51 -0800)]
Fix a bug in computing the size of a segment that led to utilization > 1.0.
The old code for computing the size of a segment (to be stored in the
segments table) could leave off the last object to be written to the
segment. This could cause the computed segment utilization to be greater
than 1.0, which should be impossible. Fix the size calculation so that it
should always include all data written to the segment. As a bonus, this
also correctly computes the size of metadata-log segments, even though the
metadata objects don't appear in the block_index table (which was
previously used for computing the segment size, but is no longer).
Michael Vrable [Wed, 12 Dec 2007 19:36:50 +0000 (11:36 -0800)]
Fix a bug that caused blocks not to be properly re-used on checksum match.
Michael Vrable [Wed, 12 Dec 2007 18:42:37 +0000 (10:42 -0800)]
Fix uninitialized variable warning in sample restore program.
Michael Vrable [Wed, 12 Dec 2007 18:34:46 +0000 (10:34 -0800)]
Include sizes in references to blocks in each file's data list.
This optimization is aimed at large files that are composed of many
blocks--including the size of each block allows a restore program to
determine the offset at which each block begins in the output file (by
adding up the sizes of the previous block). This may allow for more
efficient restores, in which file data is filled in as blocks are
encountered, instead of having to find the blocks in the order they appear
in the data list.
A future change might be to only include the sizes when necessary--files
which are composed of a single object do not need a size, nor does the last
block of a large file. But for now, simply include the size on all
objects.
This is part of a recommended format change, but one that is both forward-
and backward-compatible.
Michael Vrable [Wed, 12 Dec 2007 18:16:39 +0000 (10:16 -0800)]
Snapshot format change: extend the slice syntax with a length-only form.
Change slice format so that in addition to <start>+<length>, it is possible
to specify just <length>. This isn't needed (0+<length>) could be used
instead, but looks more pleasing if lengths are specified more frequently
on objects. Also update the various tools to correctly parse the new
syntax.
This is part of the new v0.6 format.
Michael Vrable [Wed, 12 Dec 2007 05:49:23 +0000 (21:49 -0800)]
When verifying a snapshot, check that the segment list is accurate.
This should help find bugs such as the one fixed in commit
1b39ce3ff11a.
Michael Vrable [Wed, 12 Dec 2007 01:49:44 +0000 (17:49 -0800)]
Add "intent" field to a snapshot.
This field is intended to indicate how long the backup might be kept or
what backup schedule the given snapshot is part of--for example 1 for a
daily backup, 7 for a weekly backup.
This might be used when performing segment cleaning or deleting old
snapshots, but for now the information is just stored in the local
database.
Michael Vrable [Fri, 7 Dec 2007 21:33:25 +0000 (13:33 -0800)]
Ensure that segments with reused metadata are listed in root descriptor.
Michael Vrable [Fri, 7 Dec 2007 21:01:30 +0000 (13:01 -0800)]
Add format support for efficient sparse file handling.
While making other format changes, also add in support for explicitly
representing regions of a file that are entirely zero, as can happen with
sparse files. These are represented with an object reference of the form
"zero[0+<length>]". Update the lbs tool to generate and parse these
references, and the utility code to also handle it.
The restore tools do not seek over zero regions when writing out a file, so
the file is not restored as a sparse file, but that support can easily be
added later with no change needed to the format.
Michael Vrable [Fri, 7 Dec 2007 03:16:57 +0000 (19:16 -0800)]
Upgrades to utility code for new formats, and a few more database tweaks.
Update the sample restore script and the Python code to support the new
snapshot format and the new local database schema. While updating segment
cleaning, also slightly rearrange the database schema to better support it.
Michael Vrable [Thu, 6 Dec 2007 05:36:47 +0000 (21:36 -0800)]
Update the NEWS file with some information about format changes.
Michael Vrable [Thu, 6 Dec 2007 05:27:37 +0000 (21:27 -0800)]
Drop the obsolete snapshot_contents table from the local database.
Michael Vrable [Thu, 6 Dec 2007 05:25:39 +0000 (21:25 -0800)]
Provide a script for converting the local database to the v0.6 format.
Michael Vrable [Thu, 6 Dec 2007 04:14:08 +0000 (20:14 -0800)]
Modifications to the local database: create a summary segments_used table.
Make the local database more compact by only storing, for each snapshot, a
listing of the segments it uses and the fraction of each which is used,
instead of listing all objects referenced individually.
This commit only adds the new table; it doesn't yet delete the old table
(snapshot_contents).
Michael Vrable [Mon, 3 Dec 2007 19:10:48 +0000 (11:10 -0800)]
Flag "volatile" files when creating a snapshot.
If a file has changed very near to the time it was backed up (right now 30
seconds, though this could probably be decreased to only a few seconds),
mark the file as "volatile" and do not use the stat information to skip
that file on the next backup. This is to avoid a race condition where a
file's stat information is saved, the file is dumped, and then the file is
modified again. If this happens within the same second as the earlier
modifications, then mtime and ctime will not be updated (since they already
refer to the current second), and on a subsequent backup the file would not
be stored since it appears to be unchanged. However, if the file's mtime
and ctime are in the past, then this can't happen, so use this as a test
for when it is safe to skip apparently unchanged files.
The volatile flag only needs to go in the statcache, not the main metadata
log, but for the moment it is going in both.
Michael Vrable [Thu, 29 Nov 2007 21:04:11 +0000 (13:04 -0800)]
Assorted minor code cleanups.
Michael Vrable [Wed, 28 Nov 2007 23:12:59 +0000 (15:12 -0800)]
Ensure the "name:" key shows up first in metadata output.
This isn't necessary, but is nice for readability.
Michael Vrable [Wed, 28 Nov 2007 22:22:39 +0000 (14:22 -0800)]
Partially revert metadata format changes in
6c94114148c4.
On second thought, the renaming of the "name:" field to "path:" isn't worth
the trouble of making the change since there isn't much benefit and
updating tools to deal with either format will be more complex. The other
changes can be left since they are smaller and easier to support.
Revert this now, before any releases are made with the change in effect.
Michael Vrable [Wed, 28 Nov 2007 22:18:29 +0000 (14:18 -0800)]
Drop the old statcache implementation.
The statcache is now replaced with the unified local metadata log, which is
used to aid in reusing unchanged parts of the metadata log in snapshots,
but additionally contains all the information needed to determine if a file
is unchanged.
Michael Vrable [Wed, 21 Nov 2007 23:02:54 +0000 (15:02 -0800)]
Initial implementation of metadata log sharing.
Allow metadata written to segments to be reused between snapshots. Keep
track of what metadata was written out on the client, and when identicial
metadata would be written on a subsequent backup, instead emit a reference
to the old metadata.
This needs more testing and verification. There also needs to be a
mechanism for performing the equivalent of segment cleaning for metadata,
so that the metadata log does not become excessively fragmented over time.
Michael Vrable [Tue, 20 Nov 2007 03:01:52 +0000 (19:01 -0800)]
Bugfix for splitting the metadata log in the new metadata code.
Michael Vrable [Tue, 20 Nov 2007 00:27:51 +0000 (16:27 -0800)]
Write out new-style statcache data.
Write out statcache-style data from the metadata logging module of lbs.
This will eventually replace the old statcache implementation, but is not
complete. This new statcache data is not yet read in or used elsewhere.
In the new format, the data in the statcache file has the same format as
the data in the metadata log itself. Each stanza with file information is
prefixed with a @@reference line that gives a reference to the location of
the metadata. If the metadata has not changed, this will allow metadata
log data to be re-used between snapshots.
Michael Vrable [Mon, 19 Nov 2007 17:57:42 +0000 (09:57 -0800)]
Drop the use of indirect blocks for storing pointers to data.
Now store the entire list of blocks that contain each file's contents
inline in the metadata log, even when that list is large. Previously, the
list was split out into a separate object when it contained more than 8
entries. These indirect blocks may still be useful, but they also
complicate the metadata/statcache rewrite, so for the moment disable them.
They may be reintroduced later.