cumulus.git
17 years agoRewrite object reference parser.
Michael Vrable [Wed, 4 Jul 2007 17:04:13 +0000 (10:04 -0700)]
Rewrite object reference parser.

This new version should handle all options, including checksums and object
ranges, though it hasn't been fully tested.

17 years agoOnly trust the results of a stat if two separate backups agree.
Michael Vrable [Thu, 28 Jun 2007 20:30:23 +0000 (13:30 -0700)]
Only trust the results of a stat if two separate backups agree.

This should eliminate races where stat information might not change if
there are two changes to a file within the same second--as long as two
backups are not taken less than a second apart.

17 years agoSort-of-working statcache implementation.
Michael Vrable [Thu, 28 Jun 2007 00:10:11 +0000 (17:10 -0700)]
Sort-of-working statcache implementation.

This will use stat information to determine when a file doesn't need to
be read again.  There are still some bits left to be implemented; in
particular, parsing of references needs to be fixed, since at the moment
checksums and ranges aren't supported, and any files using them will
have block lists corrupted by the stat cache.  (Fortunately, nothing
uses those reference forms yet for file contents.)

17 years agoWrite the backup descriptor as the very last step in a backup.
Michael Vrable [Wed, 27 Jun 2007 20:20:39 +0000 (13:20 -0700)]
Write the backup descriptor as the very last step in a backup.

For consistency, we should make sure that all segments and other data
needed to reconstruct a backup are written before we write out the
backup descriptor file itself.

17 years agoTry an alternate segment cleaning ordering.
Michael Vrable [Thu, 21 Jun 2007 18:02:09 +0000 (11:02 -0700)]
Try an alternate segment cleaning ordering.

This one gives more weight to expiring nearly-empty segments.

17 years agoAdd a simple script for garbage collecting old segments.
Michael Vrable [Tue, 19 Jun 2007 17:46:20 +0000 (10:46 -0700)]
Add a simple script for garbage collecting old segments.

17 years agoPreserve the "timestamp" database field when expiring segments.
Michael Vrable [Tue, 19 Jun 2007 02:10:57 +0000 (19:10 -0700)]
Preserve the "timestamp" database field when expiring segments.

When segments are repacked, we would like to keep the original timestamp
for the old objects which are written into new segments.  These changes
should now propagate that timestamp value.

17 years agoMinor formatting fix when outputting the statcache file.
Michael Vrable [Tue, 19 Jun 2007 00:45:55 +0000 (17:45 -0700)]
Minor formatting fix when outputting the statcache file.

17 years agoPartial commit of statcache support.
Michael Vrable [Mon, 18 Jun 2007 20:15:05 +0000 (13:15 -0700)]
Partial commit of statcache support.

This will cache the results of stat calls from previous backups, so that
future backups do not need to read the files in their entirety if no data
has changed.

17 years agoRename descriptor files.
Michael Vrable [Sat, 16 Jun 2007 02:31:51 +0000 (19:31 -0700)]
Rename descriptor files.

Prefix them with "snapshot-", so they will sort after all segment files.

17 years agoWait to write the backup description until a backup is finished.
Michael Vrable [Sat, 16 Jun 2007 02:29:54 +0000 (19:29 -0700)]
Wait to write the backup description until a backup is finished.

17 years agoUpdate database schema with views for choosing segments to clean.
Michael Vrable [Fri, 15 Jun 2007 04:29:41 +0000 (21:29 -0700)]
Update database schema with views for choosing segments to clean.

Create a couple of database views, one for gathering summary statistics
about segments in use, and one for ranking segments to be cleaned and
rewritten, based on utilization and the age of the data contained in it.

17 years agoFix --localdb= option.
Michael Vrable [Wed, 13 Jun 2007 16:32:49 +0000 (09:32 -0700)]
Fix --localdb= option.

17 years agoFactor code to prepare SQLite statements into a separate function.
Michael Vrable [Wed, 13 Jun 2007 02:58:02 +0000 (19:58 -0700)]
Factor code to prepare SQLite statements into a separate function.

This cuts down on a little code duplication and also makes it easy to
include SQL statements in inline strings when they are prepared.

Also, switch to the older sqlite3_prepare interface (not _prepare_v2) since
the new interface is still a bit too new to be commonly available.

17 years agoFix typo.
Michael Vrable [Tue, 12 Jun 2007 17:49:13 +0000 (10:49 -0700)]
Fix typo.

17 years agoMake segment compression/encryption filter to command-line-selectable.
Michael Vrable [Mon, 11 Jun 2007 21:45:12 +0000 (14:45 -0700)]
Make segment compression/encryption filter to command-line-selectable.

17 years agoDesign note: grouping also allows better compression.
Michael Vrable [Mon, 4 Jun 2007 17:40:33 +0000 (10:40 -0700)]
Design note: grouping also allows better compression.

17 years agoCreate an SQL script for cleaning out the local object database.
Michael Vrable [Mon, 4 Jun 2007 17:29:17 +0000 (10:29 -0700)]
Create an SQL script for cleaning out the local object database.

17 years agoMake parser in restore.pl more tolerant, and reorder descriptor fields.
Michael Vrable [Mon, 4 Jun 2007 17:28:06 +0000 (10:28 -0700)]
Make parser in restore.pl more tolerant, and reorder descriptor fields.

17 years agoAdd version information to the backup descriptor files.
Michael Vrable [Sun, 3 Jun 2007 00:14:13 +0000 (17:14 -0700)]
Add version information to the backup descriptor files.

This adds a "Format:" line at the start of a backup descriptor, with the
intent that different formats can be recognized if there is a need to make
changes in the future.  This can also be used as a magic number to identify
LBS snapshot files.

Also, update the restore.pl script to handle the new format--make the
parser more flexible so that fields can appear in any order, and
unrecognized lines are ignored.

17 years agoDo not bother to split indirect block lists into a separate segment.
Michael Vrable [Thu, 31 May 2007 05:47:10 +0000 (22:47 -0700)]
Do not bother to split indirect block lists into a separate segment.

For now, just group them with the rest of the metadata.  It's not worth
splitting them up now; this could be reverted later.

17 years agoDifferentiate between never-before-seen objects and seen-but-expired.
Michael Vrable [Thu, 31 May 2007 05:44:11 +0000 (22:44 -0700)]
Differentiate between never-before-seen objects and seen-but-expired.

We group seen-but-expired objects into different segments, since the fact
that the content has been seen before is an indicator that the data is
long-lived, and grouping by (expected future) age should help increase
segment utilization.

17 years agoCheck for errors when files are opened for dumping.
Michael Vrable [Wed, 30 May 2007 15:51:50 +0000 (08:51 -0700)]
Check for errors when files are opened for dumping.

Before, we could fail to open a file when O_NOATIME was specified but the
file was owned by another user, and we didn't notice this.  Now we do, and
we also retry the open without O_NOATIME.

17 years agoOutput filename to metadata log after fully processing file.
Michael Vrable [Thu, 24 May 2007 19:53:26 +0000 (12:53 -0700)]
Output filename to metadata log after fully processing file.

Previously, if there was an error processing the file, the metadata
dictionary was not output, but the filename was, producing an incorrect
metadata file.  This is now fixed.

17 years agoSlightly expand the set of characters which are escaped in filenames.
Michael Vrable [Wed, 23 May 2007 18:12:29 +0000 (11:12 -0700)]
Slightly expand the set of characters which are escaped in filenames.

17 years agoUse NULL in the local database to indicate that blocks are not expired.
Michael Vrable [Wed, 23 May 2007 18:12:12 +0000 (11:12 -0700)]
Use NULL in the local database to indicate that blocks are not expired.

17 years agoExtend local database once more.
Michael Vrable [Tue, 22 May 2007 04:28:46 +0000 (21:28 -0700)]
Extend local database once more.

This should hopefully add most of the features needed for the moment.
Improvements made:
  - Normalize segment names by putting them in a separate table and
    referring to them by ID everywhere else.  This should help quite a bit,
    since segment names are ~36 characters long.  Add conversion functions
    to the C++ code, which aren't optimized yet (we should cache results in
    the C++ code and not re-query the database each time).
  - Similarly, normalize snapshot names.
  - Add an expired field to the object index, so that we can stop using
    objects in future backups but still remember how old the data is.
Some more work is still needed in the C++ code, but the hope is that the
database schema itself will be more stable now.

17 years agoAdd (probably long-overdue) .gitignore file.
Michael Vrable [Sat, 19 May 2007 05:07:04 +0000 (22:07 -0700)]
Add (probably long-overdue) .gitignore file.

17 years agoAdd rudimentary command-line parsing and support for file exclusions.
Michael Vrable [Fri, 18 May 2007 22:47:36 +0000 (15:47 -0700)]
Add rudimentary command-line parsing and support for file exclusions.

17 years agoBugfix in size estimates for filtered tarfile outputs.
Michael Vrable [Thu, 17 May 2007 05:41:43 +0000 (22:41 -0700)]
Bugfix in size estimates for filtered tarfile outputs.

17 years agoBugfix for restore.pl.
Michael Vrable [Thu, 17 May 2007 05:27:31 +0000 (22:27 -0700)]
Bugfix for restore.pl.

Indirect references to data blocks were not being properly handled, so
large files (those using an indirect reference to the data) were not
restored properly.

This was at least caught by the checksums, so no data was ever silently
corrupted.

17 years agoAdd timestamps to block when they are inserted into the local database.
Michael Vrable [Wed, 16 May 2007 21:50:01 +0000 (14:50 -0700)]
Add timestamps to block when they are inserted into the local database.

17 years agoHandle new-style user/group entries in the restore script.
Michael Vrable [Tue, 15 May 2007 22:57:24 +0000 (15:57 -0700)]
Handle new-style user/group entries in the restore script.

The new style is
    uid (username)
and was previously causing a warning since the entire string is not
numeric, just the first part.  Now explicitly split it apart.  (For now, we
ignore the username string, but it could possibly be used in the future.)

17 years agoInitial support for filtering TAR files through an external program.
Michael Vrable [Tue, 15 May 2007 22:48:37 +0000 (15:48 -0700)]
Initial support for filtering TAR files through an external program.

At the moment this is hard-coded to pipe data through "bzip2 -c", but
this should be made flexible later.

17 years agoTrack which objects are used in which snapshots in the local database.
Michael Vrable [Tue, 15 May 2007 19:58:01 +0000 (12:58 -0700)]
Track which objects are used in which snapshots in the local database.

17 years agoDebugging message cleanup.
Michael Vrable [Tue, 15 May 2007 05:30:37 +0000 (22:30 -0700)]
Debugging message cleanup.

17 years agoInitial cut at re-using objects from old segments when contents match.
Michael Vrable [Tue, 15 May 2007 05:30:05 +0000 (22:30 -0700)]
Initial cut at re-using objects from old segments when contents match.

17 years agoKeep an index of old stored blocks, using sqlite3.
Michael Vrable [Tue, 15 May 2007 04:57:18 +0000 (21:57 -0700)]
Keep an index of old stored blocks, using sqlite3.

Link sqlite3 in with the snapshot program, and start to write a wrapper
around a "local database" which tracks previously-backed-up data to make
incremental backups possible.

At the moment, blocks are indexed as they are stored, but we never read
from the index, so blocks are not yet reused.

17 years agoStart writing up some design notes for LBS.
Michael Vrable [Mon, 14 May 2007 20:32:55 +0000 (13:32 -0700)]
Start writing up some design notes for LBS.

Try to explain the rationale for the chosen design, and explain some of the
other designs that were considered and rejected.

17 years agoEnsure filesize written to metadata log matches number of bytes dumped.
Michael Vrable [Mon, 14 May 2007 20:30:54 +0000 (13:30 -0700)]
Ensure filesize written to metadata log matches number of bytes dumped.

17 years agoVarious minor tweaks to the metadata format.
Michael Vrable [Mon, 14 May 2007 20:15:40 +0000 (13:15 -0700)]
Various minor tweaks to the metadata format.

17 years agoUse a timestamp in generating the descriptor filename.
Michael Vrable [Sat, 12 May 2007 19:43:06 +0000 (12:43 -0700)]
Use a timestamp in generating the descriptor filename.

Also include the timestamp (in a more readable format) in the descriptor
file itself.

17 years agoDisable debugging output for the reference restore program.
Michael Vrable [Sat, 12 May 2007 18:43:19 +0000 (11:43 -0700)]
Disable debugging output for the reference restore program.

Most of the debugging can be turned on by setting $VERBOSE = 1;

17 years agoActually recreate files in the snapshot.
Michael Vrable [Sat, 12 May 2007 18:17:33 +0000 (11:17 -0700)]
Actually recreate files in the snapshot.

With this change, the prototype restore tool is essentially functional.
Special files are still not properly handled, and there are a few other
limitations, but for the most part it all works.

17 years agoAdd in decoding (URI-style, %xx) of filenames to reference decoder.
Michael Vrable [Sat, 12 May 2007 17:21:23 +0000 (10:21 -0700)]
Add in decoding (URI-style, %xx) of filenames to reference decoder.

17 years agoContinue work on reference decoder.
Michael Vrable [Sat, 12 May 2007 06:18:43 +0000 (23:18 -0700)]
Continue work on reference decoder.

  - Add code to parse metadata sections (including following indirect
    pointers).
  - Add code to parse data block listings (including indirect pointers).
  - Verify checksums on all files.

Primarily left: actually write restored file contents out to the
filesystem.

17 years agoBegin work on a reference decoder for backups.
Michael Vrable [Sat, 12 May 2007 04:55:36 +0000 (21:55 -0700)]
Begin work on a reference decoder for backups.

The intent is that the reference decoder will eventually be a tool for
recovery, if need, before a better tool is written.  It should also help to
verify the format specification and backup tool.

The reference decoder can currently parse a single object reference and
extract the data for it.

17 years agoWrite backup descriptor to a file, not stdout.
Michael Vrable [Sat, 12 May 2007 03:54:39 +0000 (20:54 -0700)]
Write backup descriptor to a file, not stdout.

The backup descriptor names the object at the root of the snapshot, and
lists all the segments needed.  The filename used is currently fixed,
though, and should later be based on the current time.

17 years agoClean up error reporting.
Michael Vrable [Fri, 11 May 2007 21:49:50 +0000 (14:49 -0700)]
Clean up error reporting.

17 years agoUse larger metadata blocks, and don't output type field twice.
Michael Vrable [Fri, 11 May 2007 21:46:53 +0000 (14:46 -0700)]
Use larger metadata blocks, and don't output type field twice.

17 years agoDebugging output cleanup.
Michael Vrable [Fri, 11 May 2007 21:39:23 +0000 (14:39 -0700)]
Debugging output cleanup.

17 years agoAllow metadata to be written incrementally.
Michael Vrable [Fri, 11 May 2007 19:21:50 +0000 (12:21 -0700)]
Allow metadata to be written incrementally.

17 years agoRemove checksums and reference tracking; I think they are not needed.
Michael Vrable [Fri, 11 May 2007 17:37:29 +0000 (10:37 -0700)]
Remove checksums and reference tracking; I think they are not needed.

17 years agoA few minor adjustments to the ObjectReference interface.
Michael Vrable [Fri, 11 May 2007 17:32:00 +0000 (10:32 -0700)]
A few minor adjustments to the ObjectReference interface.

17 years agoAdd a new object-oriented wrapper for building object references.
Michael Vrable [Fri, 11 May 2007 05:10:26 +0000 (22:10 -0700)]
Add a new object-oriented wrapper for building object references.

17 years agoMiscellaneous changes.
Michael Vrable [Fri, 11 May 2007 00:43:00 +0000 (17:43 -0700)]
Miscellaneous changes.

  - Disable profiling output for now.  It can be turned on again later.
  - Allow destination directory for backup files to be specified on the
    command-line (but we should add proper command-line parsing later).
  - More work to track inter-object references, but not yet finished.

17 years agoStart work on a more object-oriented interface for creating objects.
Michael Vrable [Thu, 3 May 2007 22:56:04 +0000 (15:56 -0700)]
Start work on a more object-oriented interface for creating objects.

17 years agoRename tarstore -> store, since it is the only implementation now.
Michael Vrable [Thu, 3 May 2007 22:13:21 +0000 (15:13 -0700)]
Rename tarstore -> store, since it is the only implementation now.

17 years agoRemove old store implementation.
Michael Vrable [Thu, 3 May 2007 22:03:19 +0000 (15:03 -0700)]
Remove old store implementation.

17 years agoPartial support for tracker inter-object references.
Michael Vrable [Thu, 3 May 2007 21:23:21 +0000 (14:23 -0700)]
Partial support for tracker inter-object references.

17 years agoAdd indrect objects for listing contents of large files.
Michael Vrable [Wed, 25 Apr 2007 05:46:42 +0000 (22:46 -0700)]
Add indrect objects for listing contents of large files.

17 years agoRemove code using old segment writer.
Michael Vrable [Mon, 23 Apr 2007 04:41:25 +0000 (21:41 -0700)]
Remove code using old segment writer.

All writes should now happen to TAR-format segments, and data should be
serialized as text.

17 years agoAutomatically split segments to meet size targets.
Michael Vrable [Mon, 23 Apr 2007 04:09:11 +0000 (21:09 -0700)]
Automatically split segments to meet size targets.

17 years agoAdd URI-style escaping of filename characters.
Michael Vrable [Sat, 21 Apr 2007 18:05:28 +0000 (11:05 -0700)]
Add URI-style escaping of filename characters.

17 years agoRemove early test code for the tarfile storage backend.
Michael Vrable [Mon, 16 Apr 2007 04:56:33 +0000 (21:56 -0700)]
Remove early test code for the tarfile storage backend.

17 years agoReduce debugging output.
Michael Vrable [Mon, 16 Apr 2007 04:51:41 +0000 (21:51 -0700)]
Reduce debugging output.

17 years agoAdd checksums to tarstore, and partially migrate to using it for storage.
Michael Vrable [Mon, 16 Apr 2007 04:34:07 +0000 (21:34 -0700)]
Add checksums to tarstore, and partially migrate to using it for storage.

Each tarfile segment gets a file called "checksums" now which contains
SHA-1 hashes of all objects contained in it.  There is currently no
checksum computed for the checksum file itself.

Begin migrating scandir to using the tar-based storage for backups.
Backups in the old format are still written, and there is still some
other cleanup to be done, but most of the raw data is currently handled.

17 years agoBegin work on an alternate object store mechanism using the TAR format.
Michael Vrable [Sun, 15 Apr 2007 20:48:09 +0000 (13:48 -0700)]
Begin work on an alternate object store mechanism using the TAR format.

Each segment is placed in a separate file, and each object is a file stored
within that TAR archive.  libtar is used to write (and perhaps later read)
tar files.

17 years agoTag objects with a 4-byte type code.
Michael Vrable [Wed, 7 Feb 2007 21:45:30 +0000 (13:45 -0800)]
Tag objects with a 4-byte type code.

17 years agoReturn the names for allocated objects, and link file metaata to data.
Michael Vrable [Thu, 11 Jan 2007 22:47:41 +0000 (14:47 -0800)]
Return the names for allocated objects, and link file metaata to data.

Extend the interface for the new_object methods so that they can return the
segment UUID and object index for created objects.  Then, use this to write
out links from the file metadata to the data blocks making up the file, via
an indirect block.

17 years agoSupport for spreading objects across segments.
Michael Vrable [Sat, 30 Dec 2006 03:24:48 +0000 (19:24 -0800)]
Support for spreading objects across segments.

New segments are created automatically whenever an old segment becomes too
large (current limit is set at something over 1 MB, but this could be
adjusted).

17 years agoAppend checksums to segments to allow some verification.
Michael Vrable [Fri, 29 Dec 2006 19:49:20 +0000 (11:49 -0800)]
Append checksums to segments to allow some verification.

17 years agoAdd new interface for creating new segments.
Michael Vrable [Mon, 25 Dec 2006 06:28:59 +0000 (22:28 -0800)]
Add new interface for creating new segments.

17 years agoInitial support for objects encapsulated into segments.
Michael Vrable [Sun, 24 Dec 2006 04:15:31 +0000 (20:15 -0800)]
Initial support for objects encapsulated into segments.

17 years agoSome Makefile improvements:
Michael Vrable [Sun, 24 Dec 2006 01:21:10 +0000 (17:21 -0800)]
Some Makefile improvements:
  - (Mostly) automatic header dependency tracking.
  - Use pkg-config to pull in dependencies for libuuid.

17 years agoImprove comments, and track number of bytes written to a stream.
Michael Vrable [Sun, 24 Dec 2006 01:19:58 +0000 (17:19 -0800)]
Improve comments, and track number of bytes written to a stream.

17 years agoCompute SHA-1 checksums of regular files to be stored with index data.
Michael Vrable [Sat, 23 Dec 2006 20:46:19 +0000 (12:46 -0800)]
Compute SHA-1 checksums of regular files to be stored with index data.

17 years agoClean up output while scanning directories.
Michael Vrable [Sat, 23 Dec 2006 19:58:18 +0000 (11:58 -0800)]
Clean up output while scanning directories.

17 years agoExtend basic support for serializing simple data types.
Michael Vrable [Sat, 23 Dec 2006 06:53:06 +0000 (22:53 -0800)]
Extend basic support for serializing simple data types.

Add support for serializing integers, strings, and dictionaries.  Then,
extend the directory scanner to output a summary of stat information for
each file/directory.

17 years agoFill in a couple more details about a proposed file format.
Michael Vrable [Wed, 20 Dec 2006 02:55:15 +0000 (18:55 -0800)]
Fill in a couple more details about a proposed file format.

17 years agoBegin work to document a proposed file format for backups.
Michael Vrable [Sat, 16 Dec 2006 07:32:15 +0000 (23:32 -0800)]
Begin work to document a proposed file format for backups.

17 years agoAdd support for reading symlinks.
Michael Vrable [Sat, 16 Dec 2006 05:24:37 +0000 (21:24 -0800)]
Add support for reading symlinks.

17 years agoRead contents of all regular files processed.
Michael Vrable [Wed, 13 Dec 2006 16:44:21 +0000 (08:44 -0800)]
Read contents of all regular files processed.

17 years agoMore work on descending directory structure.
Michael Vrable [Tue, 12 Dec 2006 05:46:33 +0000 (21:46 -0800)]
More work on descending directory structure.

Improvements:
  - Directory listings are processed in sorted order.
  - Open regular files, but be very paranoid in doing so (try to avoid
    danger from race conditions).

17 years agoBuild with support for large files enabled.
Michael Vrable [Sun, 10 Dec 2006 19:45:38 +0000 (11:45 -0800)]
Build with support for large files enabled.

18 years agoInitial commit
Michael Vrable [Thu, 23 Nov 2006 06:33:10 +0000 (22:33 -0800)]
Initial commit