[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Gnu-arch-users] Arch Cache & cached archives
From: |
Aaron Bentley |
Subject: |
[Gnu-arch-users] Arch Cache & cached archives |
Date: |
Tue, 14 Sep 2004 12:56:48 -0400 |
User-agent: |
Mozilla Thunderbird 0.5 (X11/20040309) |
This is a description of the preliminary Arch Cache implementation with
notes on future directions. Please consider it a draft; comments and
critiques are welcome.
Although I'm using the term "cache", it's really about memoizing data
that can be time-consuming to produce, but will always be equivalent
once produced.
Layer 1: The Arch Cache
=======================
The Arch Cache abstraction connects "query paths" with streams, or
things that streams can represent. Query paths look suspiciously like
POSIX pathnames. Convenience functions are available for use with strings.
There is a test for whether the cache is enabled:
extern int
arch_cache_active (void)
Attempts to use the cache when it is not enabled will cause panics.
extern int
arch_cache_put (t_uchar **tmp_name, t_uchar *rel_query_path)
To add something to the cache, we use arch_cache_put. This returns a
file descriptor that we'll have to close, and a tmp_name that we'll
ultimately need to free.
extern void
arch_cache_commit (t_uchar *tmp_name, t_uchar *rel_query_path)
After we have written the answer to the file descriptor, we must commit
it, before the answer can become active. This step is not required for
the string wrappers.
extern int
arch_cache_has_answer (t_uchar * rel_query_path)
We can use arch_cache_has_answer to find out whether the cache has an
answer for a particular query.
extern int
arch_cache_get (t_uchar * rel_query_path)
We can use arch_cache_get to retrieve the answer for a query. It will
panic if no answer is available for that query. This is where the smart
caching functionality Tom's mentioned could hook in. One possible
inplementation would be to register a set of query handlers, and invoke
them in sequence until one of them produced an answer.
extern int
arch_cache_maybe_get (t_uchar * rel_query_path)
If we don't want to panic when the answer isn't there, we can use
maybe_get. This will return -1 if the answer is unavailable.
The convenience functions are:
arch_cache_put_str and arch_cache_get_str. Since these copy the string
verbatim, I expect to add arch_cache_put_line and arch_cache_get_line,
which will make sure the files have terminating '\n', but the strings do
not.
Things currently unhandled include:
1. statistics
2. listing answers available to certain kinds of queries, e.g. listing
full-trees available in a version.
3. erasing answers
Implementation notes:
Yes, this is implemented using the local filesystem, exactly as you'd
expect. (It could also be implemented on top of a pseudo-filesystem.)
The $HOME/.arch-params/=arch-cache file contains the prefix of the cache
heirarchy.
Layer 2: Namespace
==================
The current namespace looks like this:
/archives : data for archives, but not for specific locations
/archives/$ARCHIVE: data for a particular archive
/archives/$ARCHIVE/$REVISION: data for a particular archive revision.
I'm not sure I want to keep it this way. For scalability reasons, this
might be better: /archives/$ARCHIVE/$VERSION/$DATATYPE/$PATCHLEVEL.
That way, listing data would scale with the number of patchlevels (which
have cached queries) in the version, not version*patchlevels.
/archives/$ARCHIVE/$REVISION/full-tree.tar.gz: The full tree (same
contents as a cacherev or import) for the revision
/archives/$ARCHIVE/$REVISION/log: The patchlog for the revision
/archives/$ARCHIVE/$REVISION/delta.tar.gz: The changeset between the
revision and its direct ancestor
/archives/$ARCHIVE/$REVISION1/delta-from-REVISION2.tar.gz: (not
implemented) The changeset that transforms $REVISION2 into $REVISION1
/archives/$ARCHIVE/$REVISION/ancestor: (not implemented) The direct
ancestor of the revision
/archives/$ARCHIVE/$REVISION/type: (not implemented) The type of the
revision ("import", "simple" or "continuation")
/locations/$MANGLED_URL/NAME: (not implemented) The official name
associated with an archive location. Required for disconnected
operation or lazy initialization, but may occasionally change.
Cached Archives
===============
Cached archives are the first clients of the Arch Cache. They are a new
archive type that implements the archive.h interface. Any location
prefixed with cache: is created as a cached archive.
When they are initialized, they initialize a pointer to the real archive
by removing the "cached:" prefix.
Most implementations are exact wrappers. The functions that use the
cache are:
cache_archive_log
cache_get_patch
cache_get_cached
cache_get_import
These functions check whether the cache has an answer already. If not,
they retrieve the answer from the wrapped archive, and put it in the
cache. Then, they unconditionally get from the cache.
It would be nice to cache at commit time, but that would need to be done
at a higher level.
Comparison with local mirrors
==============================
- The user never downloads anything they don't need
- Commits are possible
- Never out of date
- Disconnected operation is not yet supported
Comparision with sparse, greedy revlibs
=======================================
- Stores intermediate downloads, not just the target revisions
- Typically more space-efficient
- Not suitable as a reference tree
Comparison with proxy caches
============================
- No "stale data" problems
- Available for SFTP
- Permanent by default
- Adaptable for disconnected use
- Higher level: Because its datatypes are Arch datatypes (not just
files), it knows what kinds of data can be stored permanently, and keeps
them by default.
- Since data is grouped by archive, accessing the same data through
different transports will not cause it to be duplicated in two caches
- Visible to Arch; higher-level functions like build_revision can use it.
- Potentially visible to tla-wrapping utilities
Aaron
--
Aaron Bentley
Director of Technology
Panometrics, Inc.
- [Gnu-arch-users] Arch Cache & cached archives,
Aaron Bentley <=
- Re: [Gnu-arch-users] Arch Cache & cached archives, Tom Lord, 2004/09/15
- [Gnu-arch-users] Re: Arch Cache & cached archives, Stefan Monnier, 2004/09/15
- Re: [Gnu-arch-users] Re: Arch Cache & cached archives, Tom Lord, 2004/09/15
- Re: [Gnu-arch-users] Re: Arch Cache & cached archives, Aaron Bentley, 2004/09/15
- Re: [Gnu-arch-users] Re: Arch Cache & cached archives, Tom Lord, 2004/09/15
- Re: [Gnu-arch-users] Re: Arch Cache & cached archives, Aaron Bentley, 2004/09/15
- Re: [Gnu-arch-users] Re: Arch Cache & cached archives, Andrew Suffield, 2004/09/16
- Re: [Gnu-arch-users] Re: Arch Cache & cached archives, Tom Lord, 2004/09/16
Message not available