[Gnu-arch-users] Arch Cache & cached archives

gnu-arch-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gnu-arch-users] Arch Cache & cached archives

From:	Aaron Bentley
Subject:	[Gnu-arch-users] Arch Cache & cached archives
Date:	Tue, 14 Sep 2004 12:56:48 -0400
User-agent:	Mozilla Thunderbird 0.5 (X11/20040309)

This is a description of the preliminary Arch Cache implementation withnotes on future directions. Please consider it a draft; comments andcritiques are welcome.

Although I'm using the term "cache", it's really about memoizing datathat can be time-consuming to produce, but will always be equivalentonce produced.


Layer 1: The Arch Cache
=======================

The Arch Cache abstraction connects "query paths" with streams, orthings that streams can represent. Query paths look suspiciously likePOSIX pathnames. Convenience functions are available for use with strings.


There is a test for whether the cache is enabled:
extern int
arch_cache_active (void)

Attempts to use the cache when it is not enabled will cause panics.


extern int
arch_cache_put (t_uchar **tmp_name, t_uchar *rel_query_path)

To add something to the cache, we use arch_cache_put. This returns afile descriptor that we'll have to close, and a tmp_name that we'llultimately need to free.


extern void
arch_cache_commit (t_uchar *tmp_name, t_uchar *rel_query_path)

After we have written the answer to the file descriptor, we must commitit, before the answer can become active. This step is not required forthe string wrappers.


extern int
arch_cache_has_answer (t_uchar * rel_query_path)

We can use arch_cache_has_answer to find out whether the cache has ananswer for a particular query.


extern int
arch_cache_get (t_uchar * rel_query_path)

We can use arch_cache_get to retrieve the answer for a query. It willpanic if no answer is available for that query. This is where the smartcaching functionality Tom's mentioned could hook in. One possibleinplementation would be to register a set of query handlers, and invokethem in sequence until one of them produced an answer.


extern int
arch_cache_maybe_get (t_uchar * rel_query_path)

If we don't want to panic when the answer isn't there, we can usemaybe_get. This will return -1 if the answer is unavailable.


The convenience functions are:

arch_cache_put_str and arch_cache_get_str. Since these copy the stringverbatim, I expect to add arch_cache_put_line and arch_cache_get_line,which will make sure the files have terminating '\n', but the strings donot.


Things currently unhandled include:
1. statistics

2. listing answers available to certain kinds of queries, e.g. listingfull-trees available in a version.

3. erasing answers

Implementation notes:

Yes, this is implemented using the local filesystem, exactly as you'dexpect. (It could also be implemented on top of a pseudo-filesystem.)The $HOME/.arch-params/=arch-cache file contains the prefix of the cacheheirarchy.


Layer 2: Namespace
==================
The current namespace looks like this:

/archives : data for archives, but not for specific locations

/archives/$ARCHIVE: data for a particular archive

/archives/$ARCHIVE/$REVISION: data for a particular archive revision.I'm not sure I want to keep it this way. For scalability reasons, thismight be better: /archives/$ARCHIVE/$VERSION/$DATATYPE/$PATCHLEVEL.That way, listing data would scale with the number of patchlevels (whichhave cached queries) in the version, not version*patchlevels.

/archives/$ARCHIVE/$REVISION/full-tree.tar.gz: The full tree (samecontents as a cacherev or import) for the revision


/archives/$ARCHIVE/$REVISION/log: The patchlog for the revision

/archives/$ARCHIVE/$REVISION/delta.tar.gz: The changeset between therevision and its direct ancestor

/archives/$ARCHIVE/$REVISION1/delta-from-REVISION2.tar.gz: (notimplemented) The changeset that transforms $REVISION2 into $REVISION1

/archives/$ARCHIVE/$REVISION/ancestor: (not implemented) The directancestor of the revision

/archives/$ARCHIVE/$REVISION/type: (not implemented) The type of therevision ("import", "simple" or "continuation")

/locations/$MANGLED_URL/NAME: (not implemented) The official nameassociated with an archive location. Required for disconnectedoperation or lazy initialization, but may occasionally change.


Cached Archives
===============

Cached archives are the first clients of the Arch Cache. They are a newarchive type that implements the archive.h interface. Any locationprefixed with cache: is created as a cached archive.

When they are initialized, they initialize a pointer to the real archiveby removing the "cached:" prefix.

Most implementations are exact wrappers. The functions that use thecache are:

cache_archive_log
cache_get_patch
cache_get_cached
cache_get_import

These functions check whether the cache has an answer already. If not,they retrieve the answer from the wrapped archive, and put it in thecache. Then, they unconditionally get from the cache.

It would be nice to cache at commit time, but that would need to be doneat a higher level.


Comparison with local mirrors
==============================
- The user never downloads anything they don't need
- Commits are possible
- Never out of date
- Disconnected operation is not yet supported

Comparision with sparse, greedy revlibs
=======================================
- Stores intermediate downloads, not just the target revisions
- Typically more space-efficient
- Not suitable as a reference tree

Comparison with proxy caches
============================
- No "stale data" problems
- Available for SFTP
- Permanent by default
- Adaptable for disconnected use

- Higher level: Because its datatypes are Arch datatypes (not justfiles), it knows what kinds of data can be stored permanently, and keepsthem by default.- Since data is grouped by archive, accessing the same data throughdifferent transports will not cause it to be duplicated in two caches

- Visible to Arch; higher-level functions like build_revision can use it.
- Potentially visible to tla-wrapping utilities

Aaron
--
Aaron Bentley
Director of Technology
Panometrics, Inc.

[Prev in Thread]

Current Thread

[Next in Thread]

[Gnu-arch-users] Arch Cache & cached archives, Aaron Bentley <=
- Re: [Gnu-arch-users] Arch Cache & cached archives, Tom Lord, 2004/09/15
  - [Gnu-arch-users] Re: Arch Cache & cached archives, Stefan Monnier, 2004/09/15
    - Re: [Gnu-arch-users] Re: Arch Cache & cached archives, Tom Lord, 2004/09/15
    - Re: [Gnu-arch-users] Re: Arch Cache & cached archives, Aaron Bentley, 2004/09/15
    - Re: [Gnu-arch-users] Re: Arch Cache & cached archives, Tom Lord, 2004/09/15
    - Re: [Gnu-arch-users] Re: Arch Cache & cached archives, Aaron Bentley, 2004/09/15
    - Re: [Gnu-arch-users] Re: Arch Cache & cached archives, Andrew Suffield, 2004/09/16
    - Re: [Gnu-arch-users] Re: Arch Cache & cached archives, Tom Lord, 2004/09/16
  - Message not available
    - Re: [Gnu-arch-users] Arch Cache & cached archives, Tom Lord, 2004/09/17
    - Re: [Gnu-arch-users] Arch Cache & cached archives, Tom Lord, 2004/09/17

Prev by Date: Re: [Gnu-arch-users] [BUG] tla dosen't handle '--' as end of option argument
Next by Date: Re: [Gnu-arch-users] Re: [BUG] tla dosen't handle '--' as end of option argument
Previous by thread: [Gnu-arch-users] 'tla changes' queries the archive
Next by thread: Re: [Gnu-arch-users] Arch Cache & cached archives
Index(es):
- Date
- Thread