monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Re: results of mercurial user survey


From: Markus Schiltknecht
Subject: Re: [Monotone-devel] Re: results of mercurial user survey
Date: Fri, 28 Apr 2006 15:50:05 +0200

Hi,

(sorry, this email got a little longish... and has brainstorming
characteristics...)

To me (and a lot of others) it seems that monotone is error checking at
the wrong place. AFAICT netsync already guarantees to deliver the data
as it has been sent on the other peer, thus consistency checking just
after syncing seems unnecessary. Of course consistency checking in
general is absolutely necessary, just not exactly after syncing and even
less before allowing the user to use (or at least read) the data.

If a peer S is serving a repository and a peer C is pulling it monotone
currently puts the burden of consistency checking on the shoulders of C.
IMHO this is generally wrong. I think every peer should check itself for
correct operation. And in case of a filesystem corruption or such on S,
the admin of S needs to know and not the user on C. Probably S should
even stop to server its repository, as it is corrupt.

On Fri, 2006-04-28 at 12:41 +0200, Richard Levitte - VMS Whacker wrote:
> The thing with a distributed system is exactly that, that it's
> distributed.  If an error occurs somewhere and remains undetected, it
> will be trasmitted to everyone involved.  In monotone's case, it would
> be everyone that does a pull.  At that point, it will be very
> difficult to change it, since it might be pushed around, so any
> correction that you do locally might be "corrected" back to the
> erroneous state.

That a good argument. And it's one way of guaranteeing that corruptions
can be detected and eliminated. But sure not the only way.

Let's see. Failures we want to prevent can occur at any time, not just
during syncing. A regular db check on every peer would ensure that
corruptions can be detected, but it would not prevent distributing the
erroneous revision.

Why not simply keep a black-list of suspect revisions? Every node can
recheck the suspected revision and decide itself, if it really is
erroneous.

This still leaves us with a problem: there is some time between the
failure and the detection (including spreading of the suspection list).
During this period, a user might commit revisions which depend on
erroneous ones. Those would be kept as they appear to be correct, but
would suddenly be disconnected because its ancestor(s) have been
deleted.

One might simply delete revisions with erroneous ancestors, but that
leads to data loss. And depending on the time frame between failure and
detection, this could be quite a lot of data. But even more important:
the user of the node with a corrupt filesystem will hopefully understand
possible data losses (and correctly blame that one, not monotone). But
due to the distributed nature of monotone, other users, with perfectly
consistent (file)systems could experience data loss, too. Those sure
won't understand and blame monotone.


Thus, I propose an option to allow deferring the consistency check. On
commit the suspection list has to be consulted and all ancestors which
are still suspected need to be checked before the commit.

This would allow a user to pull a repository and have a look at it. And
since a lot of users only want an up to date read-only copy (i.e. they
don't commit anything) that's a huge gain, IMHO.


A pull (or a sync) should inform the user about how many revisions are
currently suspected. The user could then run 'mtn db check_suspected' or
so whenever he wants. He could already check out revisions and work with
them. Of course we should still force a dependency check before commit.
I suspect a user better understands a check then.

What do you risk with that approach? A user might pull an erroneous
revision and check that out. If he is working on it, he will only notice
the error when he tries to commit (if he does commit at all).

This leads to a potential data loss for failure-unaffected users who
make changes to erroneous revisions. But IMHO that risk is neglegible:
If he has been working with that erroneous revision, the errors must not
have been obvious, otherwise he'd have noticed himself. Most probably he
can somehow recover and commit uppon another, valid revision.

However, this does not solve the problem of C having to check contents
of S (and inform that uppon failure). I'd probably vote for a background
regular db consistency check on servers - to alarm admins who don't read
their logs we could simply stop 'mtn serve' uppon failure.


What do you think? Is it feasible to implement such a suspection list?

Regards

Markus






reply via email to

[Prev in Thread] Current Thread [Next in Thread]