gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Idea for a feature


From: James Blackwell
Subject: Re: [Gnu-arch-users] Idea for a feature
Date: Fri, 26 Dec 2003 22:51:46 -0500
User-agent: Mutt/1.5.4i

On Fri, Dec 26, 2003 at 06:53:51PM -0800, Tom Lord wrote:
> 
> 
>     > From: James Blackwell <address@hidden>
> 
>     > > My first concern would be race conditions. All writers would need to
>     > > update the same file. Secondly, if you have different security on
>     > > different branches, you'd need to allow everyone with any write access
>     > > to alter meta-data, which is IMO not desirable.
> 
>     > 
>     > I think it would probably be ok. No matter what happens, we always end 
> up
>     > with the most recent date. if either finishes first, the latter one wins
>     > and gives us the date/time we want. If they both finish at the same 
> time (smp?),
>     > we end up writing the same date/time twice and we still end up with what
>     > we want. Way off in the distance, the only thing mirrors care about is
>     > "Is that date there newer than my date here?"
> 
> Your idea solves an old puzzle.   Some time ago, we pondered adding
> recursive .listing files to archives in order to speed up mirroring.
> That turns out to be an unworkable idea.   Your timestamp idea is much
> closer, though it needs some minor tweaks to actually work.
> 
> You are thinking, I think, about about timestamps in inodes of files.
> 
> Those aren't accessible (for files in archives) to arch.
> 

Maybe that's what I meant, but I don't think so. I think I was proposing a
=meta-info/last-changed file.

> It won't do to have clients write a timestamp in the contents of a
> file -- that would require clock syncronization among all clients.
>
> The best we could do is to have clients generate a probabilistically
> unique id to serve as the contents of the file.   That would, however,
> be sufficient for the task at hand.
> 

Grin. You would think so, but you don't. You only have to worry about time
syncronization when you're worried about data replication. In slightly
expanded terms, we would only have to worry about the meaning of any given
date is if we were to contact two seperate archive servers and then pick
amongst those two to decide which one was newer.

But we don't have to worry about that here, because the master slave
relationship is well established. tla doesn't really think of the date as
anything other than a sequential number that the server hands out.

'Well then,' you're probably saying, 'what are the dates for?'.

At any given time we generate a date from the clock, it is guaranteed to
be higher than the one before. Whether its a human parsable date string or
the number of seconds since '70 doesn't matter. It always gets bigger.

tla grabs the =meta-info/last-mirrored file, and compares it to the
previous version. If the last-mirrored file hasn't changed, then we're
already in sync. If it's bigger, we're out of date.


> This solution could be applied recursively leading to a fine-grained
> optimization.  It guarantees that after changing the archive, and
> precluding the event of premature client, server, or connection death
> (which can be recovered from by archive-fixup), the relevent
> "changestamp files" always changes contents -- which is enough for an
> archive-mirror process to prune its probes of an archive accurately.

I really like this idea. I could see this:


   address@hidden/=meta-info/last_changed = 12345
      mfla/=meta-info/last_changed = 12345
         mfla--devo/=meta-info/last_changed  = 12345
            mfla--devo--1.1/=meta-info/last_changed = 10234 
            mfla--devo--1.1/=meta-info/last_changed = 12345
         mfla--mainline/=meta-info/last_changed = 9066
            mfla--mainline--0.8/=meta-info/last_changed = 8655
            mfla--mainline--1.0/=meta-info/last_changed = 9066
 

When I check the archive, I have the sequence Id that I saved from the
last time I checked the archive. If I'm == the top one, I can quit here.
If I'm lower, scan the next set of directories down, so forth and so on.

This does cost us one thing though. Whenever we update that archive, we
have to walk back up all of the steps and set the new highest number.
I.E., if we change mfla--mainline--1.0's last_changed to 13456, I have to
change mfla--mainline, mfla, and address@hidden as well.

Now, one of the things I worried about initially was clock skew -- what if
the user goes back in time from the last time he updated. That is
something we can deal with. If our new serial number is less than the
last_changed for address@hidden, then quit with an error that clock
skew was detected. 

This is where the dates come into play. We can ignore the whole race
condition by using a wider granularity than any potential race. I.E., out
of "2004-12-25-04:00:45", we only consider the "2004-12-25-04:00" part.

Granted, we might end up occasionally syncronizing a branch that we don't
need to, but thats only in the worst case scenario.


> 
> 
>     > Your second objection is certainly more serious. Though I can imagine 
> that
>     > situation existing, I don't think it would be a common one. I would
>     > imagine that in most cases, rather than working in an archive in which
>     > permissions are restricted (except for a certain branch), people would
>     > create their own archives.
> 
> Let's suppose that any directory in an archive (above the revision
> directory level) can (optionally) have a ".change-id" file.   Let's
> call archives that use this feature "change-stamped archives".

One thing about this. We have to always walk back up all the steps before
we make sure that .change-id files don't exist. Using your definition, if
a branch didn't have change-id, we would forget to check to see whether or
not the category or archive had one.

(scanning down...) Lol. Your last page sounds like my first page. I must
have understood your post better than I thought before I went to the
store.

Yeah. We're on the same page. 

One thing we do have to watch out for that you didn't mention. If we're
talking sequential ids of *some* sort, we have to make certain that we
never, ever replace a higher number with a lower one. I say that because I
think that if we ever do that, that part of the tree down could have its
changes hidden to random mirrors until the next time its updated and its
serial is changed again. It would be unpredicable code that would usually
work. 


> Normally, whenever a client adds a revision, it should update the
> change-stamp of each parent directory of the revision directory, up to
> the archive root.   In particular, it should store a newly generated
> p.u.i.d. there.
> 
> An archive-mirror call taking a change-stamped archive as its source
> would then be required to read the top-level change-stamp file.  If
> that has changed, it must read the change-stamps for each category,
> and so on recursively.
> 
> Now, let's make a special case:  if a change-stamp file for a given
k> directory contains just the string "not shared", then an archive
> mirror process must recurse below that directly unconditionally, and
> clients must not attempt to overwrite that file.   (That solve's Rob's
> second concern.)
> 
> Very nice.
> 
> -t
> 

-- 
James Blackwell        Using I.T. to bring more                570-407-0488
Owner, Inframix        business to your business        http://inframix.com

   GnuPG (ID 06357400) AAE4 8C76 58DA 5902 761D  247A 8A55 DA73 0635 7400




reply via email to

[Prev in Thread] Current Thread [Next in Thread]