[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: casefile.c revision
From: |
John Darrington |
Subject: |
Re: casefile.c revision |
Date: |
Sat, 4 Jun 2005 08:43:49 +0800 |
User-agent: |
Mutt/1.5.4i |
On Thu, Jun 02, 2005 at 09:59:16PM -0700, Ben Pfaff wrote:
Ah. I see. To my mind, this is a little different from random
access. It's more like a "bookmark", in effect "Here, keep my
place for me while I peek ahead a little bit".
I can see a bunch of ways this might be implemented. First, we
could literally implement something like a bookmark. The
following two ways are equivalent, I think, but they are
conceptually a bit different:
A. Add a casereader_clone() that makes a new copy of a
casereader, so that we can read ahead in one
casereader and then make another pass across that same
data in anther one.
B. Create a new "class" called a casefile_position (or
maybe call it a "casemark"?). Then add a
casereader_tell() to save a position and
casereader_rewind() to go back to a position. We'd
also want a casefile_position_destroy() (see below).
Second, we could use an intermediate casefile:
C. Copy each case into the intermediate casefile as we
go. When we know what the rank of a set of cases will
be, copy the intermediate casefile into the final
casefile, changing the ranks as we go. Typically the
intermediate casefile would only have 1 case in it (as
for 1, 2, and 4 in your example above) but it could
end up with 100,000,000 cases (as for 3 in your
example above).
I like A and C the best. I don't think C would even need any
change to the casefile code, although some optimization might be
helpful.
From the programmer's perspective (ie. the person writing rank like
commands) I think that A is the best. The only thing is, one has to
bear in mind that casereader_clone()/ casereader_destroy() will be
called at least once per case, so some optimisation would be in order
here too --- perhaps a memory pool dedicated to each casefile would be
a good idea. Also, I suppose it'd not make sense to clone a
destructive reader?
So... why the heck do I think that this is better than just a
"random seek" operation? Well, mostly because I like to think of
casefiles as something that you usually stream from one place to
another. That is, I like to think of them as analogous to pipes,
not to files.
In particular, I want casefiles to be able to support
"destructive readers", which are readers where once you've read a
case, it's gone--deleted, destroyed, etc. A destructive reader
can be useful because, when the casefile data is in-memory, the
reader doesn't have to make a copy of data if it wants to modify
it; instead it can just modify the copy that the casefile had.
The copy-on-write case implementation in case.[ch] supports this
out-of-the-box.
Unfortunately, supporting random access means that this useful
optimization isn't possible, because at any time we could seek
backward to the first case (and expect to find it in its
unmodified form). On the other hand, if we have to indicate how
many records back we can go (as in A or B) we can still discard
anything that lies before any marker. (This is why we'd want a
casefile_position_destroy() in B: so that we know when markers
are gone and can thus discard anything before them.)
Am I making sense?
Yes. I'm beginning to understand the casefile stuff better now.
Perhaps the should have been called "casestream".
J'
--
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
See http://pgp.mit.edu or any PGP keyserver for public key.
pgp1ob9TDwliR.pgp
Description: PGP signature
- casefile.c revision, John Darrington, 2005/06/01
- Message not available
- Message not available
- Re: casefile.c revision, Ben Pfaff, 2005/06/02
- Re: casefile.c revision, John Darrington, 2005/06/02
- Re: casefile.c revision, Ben Pfaff, 2005/06/03
- Re: casefile.c revision,
John Darrington <=
- Re: casefile.c revision, Ben Pfaff, 2005/06/04
- Re: casefile.c revision, John Darrington, 2005/06/04
- Re: casefile.c revision, Ben Pfaff, 2005/06/24
- Re: casefile.c revision, John Darrington, 2005/06/25