Michael Vrable [Thu, 2 May 2013 03:25:03 +0000 (20:25 -0700)]
Factor out time formatting functions.
Michael Vrable [Tue, 30 Apr 2013 15:32:57 +0000 (08:32 -0700)]
Switch to generic hash algorithms for subfile incrementals.
Michael Vrable [Mon, 29 Apr 2013 23:45:43 +0000 (16:45 -0700)]
Clean up database, and timestamp handling in particular.
Michael Vrable [Fri, 26 Apr 2013 19:50:30 +0000 (12:50 -0700)]
Improve database rebuilding.
- Compute estimated sizes (lower bounds) for objects.
- Do not rebuild subfile incrementals for very small objects.
Michael Vrable [Thu, 25 Apr 2013 19:49:26 +0000 (12:49 -0700)]
Cleanup to used data tracking in localdb.
Michael Vrable [Thu, 25 Apr 2013 03:52:15 +0000 (20:52 -0700)]
Add code to rebuild_database.py to recompute segment metadata.
This requires that the segments be available.
Michael Vrable [Wed, 24 Apr 2013 19:57:32 +0000 (12:57 -0700)]
Database rebuilder fixes.
Michael Vrable [Wed, 24 Apr 2013 19:04:48 +0000 (12:04 -0700)]
.gitignore update
Michael Vrable [Wed, 24 Apr 2013 15:58:14 +0000 (08:58 -0700)]
Convert to sqlite3 module, from pysqlite2.
Michael Vrable [Mon, 18 Mar 2013 17:21:14 +0000 (10:21 -0700)]
Reworking cleanup.
Michael Vrable [Wed, 27 Feb 2013 18:12:36 +0000 (10:12 -0800)]
Drop intent field from the database.
Michael Vrable [Sat, 2 Feb 2013 16:40:36 +0000 (08:40 -0800)]
Update Python code to work with the new backup format.
Michael Vrable [Fri, 14 Dec 2012 04:34:21 +0000 (20:34 -0800)]
Changes to the Cumulus backup format and tools.
- Switch to a hierarchical file layout.
- Remove old references to the "LBS" name.
Michael Vrable [Wed, 16 Jan 2013 20:15:22 +0000 (12:15 -0800)]
Start on a tool to rebuild the local database if it is lost.
This can also be used to reconstruct the local database in a new format, or
upgrade hash algorithms.
Michael Vrable [Sat, 1 Dec 2012 05:12:53 +0000 (21:12 -0800)]
Start work on tests for Cumulus.
Michael Vrable [Fri, 21 Sep 2012 22:50:14 +0000 (15:50 -0700)]
Improve tracking of segments and segment utilization.
Update the local database and cumulus binary to keep better track of
segment utilization data, with an eventual goal of improving cleaning
algorithms.
Michael Vrable [Thu, 27 Sep 2012 04:24:49 +0000 (21:24 -0700)]
NEWS and TODO file updates.
Michael Vrable [Wed, 26 Sep 2012 15:23:57 +0000 (08:23 -0700)]
Reword copyright notices on code in third_party for clarity.
Michael Vrable [Tue, 25 Sep 2012 23:09:25 +0000 (16:09 -0700)]
Update copyright notices to use a central AUTHORS file.
Michael Vrable [Wed, 26 Sep 2012 14:54:25 +0000 (07:54 -0700)]
Rework hash implementations to provide additional algorithms.
Provide a generic hash interface, rework SHA-1 to use the interface, and
also add code for SHA-224/SHA-256.
Michael Vrable [Thu, 27 Sep 2012 04:24:58 +0000 (21:24 -0700)]
Add sha224/sha256 as supported hash algorithms.
Document new hash algorithms in the format description, and include all
supported algorithms in the Python code. The cumulus binary does not yet
support the new algorithms.
Michael Vrable [Thu, 30 Aug 2012 04:28:16 +0000 (21:28 -0700)]
Create a third_party directory for files copied from other projects.
Michael Vrable [Mon, 29 Oct 2012 16:28:29 +0000 (09:28 -0700)]
Collected bugfixes, improvements, and cleanups.
- Do not leak file descriptors when merging new include/exclude rules.
- Improved error message when unable to open a directory.
- Delete obsolete clean-segments.pl script (long since replaced by
functionality in cumulus-util).
Michael Vrable [Tue, 25 Sep 2012 21:03:43 +0000 (14:03 -0700)]
Update usage text for new include/exclude filtering mechanism.
Michael Vrable [Fri, 21 Sep 2012 00:41:02 +0000 (17:41 -0700)]
First step towards a new, improved cumulus front-end.
This commit adds several things:
- Rules for selecting sets of backups for expiration, for managing old
snapshots.
- A configuration file format and parser for listing settings such as
backup expiration policies.
- Code in a small utility library that can expire old snapshots
according to configuration settings.
Eventually this code should be part of a new cumulus front-end in Python
that can integrate snapshot and database management with backup runs.
Michael Vrable [Thu, 20 Sep 2012 20:43:22 +0000 (13:43 -0700)]
Refactor cumulus-util into a library plus small command front-end.
There is still more cleanup work to be done here, but this first step
makes commands in cumulus-util importable as a Python module for re-use
elsewhere.
Michael Vrable [Fri, 31 Aug 2012 03:15:27 +0000 (20:15 -0700)]
Switch to hashlib (Python >= 2.5) for hash algorithms.
The old sha module is deprecated, and hashlib gives access to newer hash
algorithms. Use it to add SHA-256 support as well.
Michael Vrable [Wed, 12 Sep 2012 03:30:32 +0000 (20:30 -0700)]
Reimplement the file include/exclude filtering mechanism.
Implement something more similar to that rsync does. Accept some number of
starting points for the backup, and an ordered set of include/exclude
patterns. Also permit rules to be merged in during the backup process, by
per-subtree rule files.
Michael Vrable [Sat, 16 Jun 2012 06:28:32 +0000 (23:28 -0700)]
Add a cache around getpwuid/getgrgid to avoid repeated calls.
Michael Vrable [Thu, 31 May 2012 03:18:27 +0000 (20:18 -0700)]
Put linker flags at end of compiler command-line.
Put $(LDFLAGS) after the sources, since otherwise the linker may be unable
to find all symbols.
Michael Vrable [Wed, 30 May 2012 04:47:50 +0000 (21:47 -0700)]
Update to support new snapshot version in Python tools.
Also, remove some older references to "lbs".
Michael Vrable [Wed, 30 May 2012 04:39:49 +0000 (21:39 -0700)]
Some updates to the backup format:
- The name of the descriptor file is now based on the timestamp in UTC,
not local time. This should give better behavior in the case of
frequent snapshots and daylight saving time changes.
- Update the backup format to v0.11 (other changes may yet come before
release); with this also include a name change from "LBS" to "Cumulus"
in the format name.
Michael Vrable [Wed, 30 May 2012 04:15:55 +0000 (21:15 -0700)]
Rename scandir.cc to main.cc.
Michael Vrable [Tue, 29 May 2012 15:07:21 +0000 (08:07 -0700)]
Prepare 0.10 release.
Michael Vrable [Thu, 23 Jun 2011 18:51:01 +0000 (11:51 -0700)]
Enable additional compiler warnings
Michael Vrable [Wed, 30 Jun 2010 19:29:49 +0000 (12:29 -0700)]
Apply fixes to s3:// URL parsing under Python 2.6.
The urlparse module starting in Python 2.6 appears to be compliant with RFC
3986, while previous versions were not. This causes a change in the
behaviro of parsing s3:// URLs, however, resulting in breakage with Python
2.6.
Try to fix this by adjusting our code so that it works with either the old
or the new behavior.
Albert Dengg [Wed, 31 Mar 2010 20:33:03 +0000 (22:33 +0200)]
add some further documentation on the sftp backend
* document that password auth and passowrd protected keys are not supported
* document the configuration via ~/.ssh/config
* document that the host key already has to be in ~/.ssh/known_hosts
Albert Dengg [Wed, 31 Mar 2010 20:33:02 +0000 (22:33 +0200)]
fix usage of wrong variable
* fix some leftover reference to the wrong variable (self.path instead
of self.netloc)
Michael Vrable [Tue, 30 Mar 2010 22:03:56 +0000 (15:03 -0700)]
Formatting/spelling fix.
Michael Vrable [Tue, 30 Mar 2010 21:59:38 +0000 (14:59 -0700)]
Some assorted fixes for the SFTP backend.
- Automatically call the close() method of a storage backend when the
object is garbage collected.
- Calling stat in the SFTP backend when the file doesn't exist will raise
a Cumulus NotFoundError instead of a generic IOError.
- Avoid the use of keyword arguments when calling SFTPClient methods (in
my testing the first argument might be called 'path' instead of
'filename', but avoid the problem altogether by just using positional
arguments).
Albert Dengg [Fri, 26 Mar 2010 13:04:14 +0000 (14:04 +0100)]
code cleanup & regression fix
Albert Dengg [Fri, 26 Mar 2010 13:04:13 +0000 (14:04 +0100)]
work for python 2.5
* import "with" statement from __future__ to be able to use it in 2.5
Albert Dengg [Fri, 26 Mar 2010 13:04:12 +0000 (14:04 +0100)]
* document requirements for sftp storage
Albert Dengg [Fri, 26 Mar 2010 13:04:11 +0000 (14:04 +0100)]
implement a basic sftp storage backend
* add an option to cumulus-store storage backends to explictly close the
connection, needed because of the design of the ssh/sftp lib
* add a sftp storage backend based on paramiko
Michael Vrable [Wed, 30 Sep 2009 18:47:47 +0000 (11:47 -0700)]
Have FTP backend reconnect on timeout.
Additionally, ensure that transfers are done in binary mode.
Patch provided by Ralf Schlatterbeck <rsc@runtux.com>.
Michael Vrable [Sat, 26 Sep 2009 19:09:35 +0000 (12:09 -0700)]
The map::at method does not always exist, so instead use map::find.
Thanks to Achim J. Latz <achim.latz@qustodium.net>.
Michael Vrable [Wed, 23 Sep 2009 18:47:43 +0000 (11:47 -0700)]
Ensure printf format specifiers and types match (fixes compiler warning).
Ralf Schlatterbeck [Thu, 10 Sep 2009 21:42:20 +0000 (14:42 -0700)]
Implement FTP backend and other code cleanups.
Details:
- Fix a race condition in setup of RemoteStore: The backup_script was
set *after* a thread was already started. Sometimes this lead to a
segfault because the thread tested the backup script twice. During the
first test the variable wasn't yet defined while in the second test it
was -- this lead to a segfault due to an uninitialized output file.
I've move the backup_script to the constructor as an optional
parameter.
cumulus-util:
- put all command-documentation into docstrings of respective function
- add the docstrings to the usage text, so that we know which commands
exist
- Fix hard-coded extension '.tar.gpg' for cmd_list_snapshot_sizes
- framework for automatically computing the right method to call for a
given command
python/cumulus/store/file.py:
- fix constructor so that we can directly call it when it's not called
via the factory with an URL as parameter
- for NotFoundError give type and filename
python/cumulus/store/ftp.py:
- new FTP backend
Michael Vrable [Wed, 26 Aug 2009 18:45:19 +0000 (11:45 -0700)]
Add a few items to the TODO list.
Michael Vrable [Sat, 8 Aug 2009 02:28:31 +0000 (19:28 -0700)]
Fix a segfault-causing bug when converting a numeric group to a name fails.
The original code had a copy-and-paste bug when converting a numeric group
id into a symbolic group name: rather than checking that getgrgid returned
a valid result, it checked the result of getpwuid. If any files in the
backup snapshot belonged to a non-existent group, this resulted in a
segfault.
Problem found and patch provided by Chris Wilson <chris@aptivate.org>.
Michael Vrable [Sat, 8 Aug 2009 02:22:42 +0000 (19:22 -0700)]
Add --exclude-name option.
The --exclude-name will exclude from a snapshot files or directories with
the specified name. This contrasts with --exclude, which requires that the
full path be specified.
Initial patch by Chris Wilson <chris@aptivate.org>. Committed with minor
modifications.
Michael Vrable [Tue, 28 Jul 2009 23:46:22 +0000 (16:46 -0700)]
Update for 0.9 release.
Michael Vrable [Mon, 20 Jul 2009 01:02:38 +0000 (18:02 -0700)]
Update README with information about remote storage.
Michael Vrable [Tue, 30 Jun 2009 18:25:30 +0000 (11:25 -0700)]
Include a missing header file.
This should fix compilation with newer versions of GCC. Problem originally
reported by Robert Rebstock <rebstock@scienceworks.com>.
Michael Vrable [Sun, 31 May 2009 06:21:10 +0000 (23:21 -0700)]
Implement rudimentary garbage collection.
Implement a garbage collection method in cumulus-util which will search for
files not referenced by any current snapshots and delete them. This still
doesn't let snapshots themselves be deleted automatically, but after
manually deleting a snapshot this will quickly delete all other old files.
Michael Vrable [Sun, 31 May 2009 06:19:10 +0000 (23:19 -0700)]
Implement metadata caching for S3 backend.
Amazon S3 will return some limited object metadata when a list operation is
performed. This is significantly cheaper than fetching the information for
objects one at a time. In the S3 backend, implement a scan() method that
will list all objects and cache the metadata, then return cached results
when stat() is called.
Michael Vrable [Thu, 26 Mar 2009 21:35:27 +0000 (14:35 -0700)]
lbs-filter-gpg has been renamed to cumulus-filter-gpg.
An instance of this was missed in the code. Caught by Achim J. Latz
<achim.latz@qustodium.net>.
Michael Vrable [Wed, 14 Jan 2009 22:06:31 +0000 (14:06 -0800)]
Include segment add/remove counts in list-snapshot-sizes.
Michael Vrable [Mon, 15 Dec 2008 23:15:19 +0000 (15:15 -0800)]
Detect decompression script needed for segments based on extension.
Michael Vrable [Mon, 15 Dec 2008 22:29:13 +0000 (14:29 -0800)]
cumulus-sync: Create a tool for copying snapshots between locations.
This will automatically find and copy all needed segments that are not
already present, and can handle both local filesystems and remote storage
with Amazon S3.
Michael Vrable [Thu, 20 Nov 2008 23:47:52 +0000 (15:47 -0800)]
Drop the use of exceptions for fatal error handling.
Michael Vrable [Thu, 20 Nov 2008 23:38:39 +0000 (15:38 -0800)]
Improve handling of file-not-found in remote storage layer.
Implement a single NotFoundError which is thrown whenever a file does not
exist in any remote store, instead instance-specific error handling.
Michael Vrable [Thu, 20 Nov 2008 20:11:41 +0000 (12:11 -0800)]
Delete contrib/cumulus-store-s3.
This has been replaced by cumulus-store, and the main cumulus executable
now uses a different interface so cumulus-store-s3 won't work any longer.
Michael Vrable [Thu, 20 Nov 2008 20:10:06 +0000 (12:10 -0800)]
Re-do cumulus side of upload script interface.
Update the cumulus executable so that the interface for the remote upload
script is compatible with the new cumulus-store script, allowing cumulus to
easily target different storage backends.
Michael Vrable [Thu, 20 Nov 2008 19:55:13 +0000 (11:55 -0800)]
Introduce a script to provide access to remote repositories.
cumulus-store is a Python script that uses the cumulus library code to
access various storage repositories (local file, S3, and extensible to
others). It allows non-Python code to access these storage repositories
through a simple interface through stdin/stdout.
Additionally, make a few extensions and fixes to the cumulus Python
libraries.
Michael Vrable [Thu, 13 Nov 2008 22:38:44 +0000 (14:38 -0800)]
Makefile: Fix clean target.
Michael Vrable [Thu, 6 Nov 2008 18:47:56 +0000 (10:47 -0800)]
cumulus-util: Automatically set Python search path.
Attempt to set the Python library search path so that cumulus-util can find
the cumulus Python modules, without PYTHON_PATH having to be set
explicitly.
Modules are looked for in the "python" directory where the cumulus-util
binary resides; this is appropriate for running cumulus-util directly from
the source code directory, but may not be if the tools are installed
somewhere else.
Michael Vrable [Thu, 6 Nov 2008 18:47:18 +0000 (10:47 -0800)]
cumulus-util: In list-snapshot-sizes output, display backup intent values.
Michael Vrable [Wed, 17 Sep 2008 22:37:40 +0000 (15:37 -0700)]
Extend FUSE functionality.
Add caching of metadata for performance, and reading of file data.
Michael Vrable [Tue, 12 Aug 2008 22:14:35 +0000 (15:14 -0700)]
Start a proof-of-concept FUSE interface to old snapshots.
Begin work on a FUSE interface to cumulus, allowing old snapshots to
displayed as a mounted filesystem. Though only partly-implemented, already
it is possible to read the directory structure and stat information for
files. File contents cannot yet be extracted.
To implement this efficiently, random access to cumulus metadata was
implemented through the use of a binary search. Some optimization is still
needed, and some caching should probably still be added.
Michael Vrable [Mon, 11 Aug 2008 21:38:30 +0000 (14:38 -0700)]
Allow a URL to be used in cumulus-util to specify a store location.
Both file:/// and s3:// URLs are supported, reading data from the local
filesystem or Amazon S3, respectively. Local paths (without a file:///
prefix) can still be specified.
Michael Vrable [Wed, 6 Aug 2008 19:11:22 +0000 (12:11 -0700)]
Begin new storage-abstraction layer.
Begin work on new Python code for providing uniform access to both local
filesystem and remote S3 storage. Convert the existing Python module to
use the new interface.
Michael Vrable [Fri, 1 Aug 2008 18:37:35 +0000 (11:37 -0700)]
Makefile cleanup.
Michael Vrable [Fri, 1 Aug 2008 18:30:35 +0000 (11:30 -0700)]
Prepare for 0.8 release.
Michael Vrable [Wed, 30 Jul 2008 20:06:54 +0000 (13:06 -0700)]
Fix typo in restore.pl.
Michael Vrable [Wed, 30 Jul 2008 18:48:45 +0000 (11:48 -0700)]
Update restore.pl for new snapshot format (v0.8).
Michael Vrable [Tue, 15 Jul 2008 23:27:19 +0000 (16:27 -0700)]
Rebuild sub-block signatures when --rebuild-statcache is specified.
Michael Vrable [Tue, 15 Jul 2008 23:26:50 +0000 (16:26 -0700)]
Documentation updates.
Michael Vrable [Mon, 14 Jul 2008 23:19:58 +0000 (16:19 -0700)]
Update help text to include --rebuild-statcache.
Michael Vrable [Mon, 14 Jul 2008 22:59:21 +0000 (15:59 -0700)]
--rebuild-statcache bugfix.
Check that a file is actually unchanged (and not just present in the old
statcache with different metadata) before printing a warning about
differing checksums.
Michael Vrable [Mon, 14 Jul 2008 22:49:18 +0000 (15:49 -0700)]
Add the --rebuild-statcache option.
Add an option which forces cumulus to re-read all files, even if they seem
not to have changed.
This has two main purposes: first, it helps to rebuild the contents of the
statcache from data actually on disk, and as such is useful for adding
object size annotations to files which haven't changed (ordinarily, the
straight re-use of data from the statcache prevents this).
Second, it can be used to detect some simple forms of data corruption; if a
file has not changed (according to stat information) but the checksum
doesn't match the old value in the statcache file, a warning is printed.
Michael Vrable [Mon, 14 Jul 2008 21:17:57 +0000 (14:17 -0700)]
Better cope with null values in the segments_used table.
Treat nulls utilization values as 0.0 for cleaning purposes. These
shouldn't come up, but may have been generated due to bugs in the SQLite
library, so deal gracefully with them instead of failing with an exception.
Michael Vrable [Thu, 10 Jul 2008 18:05:19 +0000 (11:05 -0700)]
Make use of size assertions in references where possible.
When writing out a reference to what is known to be a complete object, use
the size-assertion form of a reference.
Michael Vrable [Tue, 1 Jul 2008 04:35:13 +0000 (21:35 -0700)]
Eliminate a gcc warning.
Add parentheses around arithmetic near a shift operator, to get rid of a
gcc warning. The code was correct before, and this change causes it to
diverge from the LBFS sources it was derived from, but it is worthwhile to
get rid of the gcc warning.
Michael Vrable [Mon, 30 Jun 2008 21:17:08 +0000 (14:17 -0700)]
Extend object reference syntax with size assertions.
Object references can now include a size assertion, such as [=1024]
which indicates that the referenced object is exactly 1024 bytes in
length. If a metadata log or statcache file is produced using this
reference form where appropriate, then it should be possible to rebuild
much of the object index in the local database (by looking for files
which are unchanged and computing hashes of blocks from that file where
it is known that an entire object was used, not just a fragment of an
object).
This commit merely adds support for parsing the new references; they are
not yet generated by any code.
Michael Vrable [Mon, 30 Jun 2008 20:28:12 +0000 (13:28 -0700)]
Add some missing #include statements.
Some standard include files (stdlib.h and string.h) were not being included
where necessary. This worked in the past, but seems to break for some
versions of the compiler/standard library, so be sure to include what we
need.
Michael Vrable [Mon, 23 Jun 2008 23:12:01 +0000 (16:12 -0700)]
README corrections.
Michael Vrable [Mon, 23 Jun 2008 21:08:02 +0000 (14:08 -0700)]
Document the "-v/--verbose" option.
Michael Vrable [Mon, 23 Jun 2008 20:49:22 +0000 (13:49 -0700)]
NEWS updates for v0.7 release.
Michael Vrable [Mon, 16 Jun 2008 22:25:22 +0000 (15:25 -0700)]
Add a verbose option to cumulus.
By default, do not output a listing of all files as they are backed up. If
--verbose or -v is specified, then do so.
Michael Vrable [Fri, 13 Jun 2008 16:25:00 +0000 (09:25 -0700)]
README updates: explain restores in more detail.
Michael Vrable [Thu, 12 Jun 2008 20:53:25 +0000 (13:53 -0700)]
Compute checksum of checksums file while it still exists.
When data is being stored remotely, the checksums file (containing hashes
for all the segments needed by a snapshot) may only be stored locally for a
short period of time. We can't wait to compute its checksum (for inclusion
into the root descriptor) until the time when the root descriptor is
writtenout, since the checksum file may be gone by then.
Compute the checksum earlier (before sending the checksums file to remote
storage), and save the checksum value until it is written out later.
Michael Vrable [Thu, 12 Jun 2008 20:49:50 +0000 (13:49 -0700)]
Fix an example command in the README.
Michael Vrable [Mon, 9 Jun 2008 18:01:25 +0000 (11:01 -0700)]
Updates to documentation and contributed scripts for name change.
Michael Vrable [Mon, 9 Jun 2008 17:25:51 +0000 (10:25 -0700)]
Do not store subfile signatures for very short blocks.
Don't bother to store subfile signatures for very short files, since it is
probably not worth the effort: we're probably best off just storing a new
copy of the data if it changes anyway.
For now, we used a fixed and non-scientific threshold of 16 kB as the
minimum size before we'll save subfile signatures. However, short blocks
that are created to store any new chunks needed for a subfile incremental
are always indexed (since we are likely to want to use the same chunks
again in the next backup).
Michael Vrable [Wed, 4 Jun 2008 00:00:34 +0000 (17:00 -0700)]
.gitignore update.
Michael Vrable [Tue, 3 Jun 2008 21:50:11 +0000 (14:50 -0700)]
Update name of lbs-util program.
Michael Vrable [Tue, 3 Jun 2008 21:44:32 +0000 (14:44 -0700)]
Change name of project to Cumulus.
Start changing some references to the LBS name to Cumulus instead. So far,
changes are in a few of the more user-visible places (name of the
executable, name in the version string). Many internal references are not
changed, and likely will not be changed immediately (since some of the
changes would change format compatibility).
Michael Vrable [Mon, 2 Jun 2008 21:11:53 +0000 (14:11 -0700)]
Hypens are allowed as key names in RFC822-style data.
Update the parsers for the RFC822-style key-value lists. Allow a hyphen in
key names. Previously, the "Backup-Intent" field was being ignored because
its name was considered invalid; this change should fix that.
Michael Vrable [Mon, 2 Jun 2008 20:48:36 +0000 (13:48 -0700)]
Store unspecified scheme names in database as empty string, not null.
Avoid the use of nulls to represent an unspecified backup scheme in the
local database. This should fix the bug where database cleaning would not
touch backups without a scheme name.