[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Gnu-arch-users] semantics of restricted mkpatch
From: |
Tom Lord |
Subject: |
[Gnu-arch-users] semantics of restricted mkpatch |
Date: |
Tue, 7 Oct 2003 09:34:50 -0700 (PDT) |
Short answr:
Generalizing limits is hard.
Minimally, some math needs to be done before I can merge your
changes. Ideally, some more math needs to be done to define more
precisely what limits should mean and to prove that this algorithm
actually does that.
Critically: any algorithm for LIMITS must be accompanied by a
mathematical proof that it will always either:
1) refuse to produce a changeset
2) produce a changeset which is guaranteed to
apply exactly to the ORIG tree
That property is critical because, without it, archive integrity can
be lost (e.g. you can `commit' a revision that you cannot `get').
And ideally: it would be nice to begin to have a mathematical model of
how we think limits should work -- a model of what the user
documentation would say; a model that describes how users are supposed
to think about LIMITS. Then, it would be nice to have a proof that
any proposed algorithm actually implements that model.
I'm concerned about this particular code because I don't have any a
priori reason to believe it satisfies the critical property above,
and also because, while I believe it can handle simple cases
reasonably, I suspect it does something surprising and unwelcome in
some tricky cases.
Given that the ideal sets the bar very high, a practical approach
might be to continue down the path established with the existing file
limits: it's better to have a restricted-capability LIMITS mechanism
that is known to be good than to have a general capability that is
poorly understood and possibly incorrect.
Long answer:
Selected {commit/undo/etc} all rely on a change to mkpatch. The
change adds a new parameter, the LIMITS specification. It compares
the ORIG and MOD trees but produces a changeset that (in some useful
sense) describes only the changes named by LIMITS.
Imprecisely, we expect that LIMITS is a list of file and directory
names. Changes to those files or within those directories are
included in the changeset; changes elsewhere in the tree are not.
The hard part to all this is saying precisely what LIMITS is and what
it is supposed to mean. The other hard part is ensuring that what we
think it is supposed to mean is actually a logically consistent idea.
Following are some examples of hard cases: cases where (at least my)
intuitive expectations of what LIMITS should do gets a bit tricky.
In these examples, I'm explaining the behavior of LIMITS in terms of
three trees: ORIG, MOD, and LOGICAL-MOD. The idea is that:
mkpatch ORIG MOD LIMITS
should produce exactly the same changeset as:
mkpatch ORIG LOGICAL-MOD
One way to view the problem of limits is to see it as a need to
define `make_logical_mod' where:
LOGICAL-MOD == make_logical_mod (ORIG,MOD,LIMITS)
with the property that:
LOGICAL-MOD == mkpatch(ORIG,MOD,LIMITS)[ORIG]
* case 1 -- ignoring renames
Let's suppose that we have
ORIG TREE: tag modified
./dir-A T1
./dir-A/file-A T2
./dir-A/file-b T3
MOD TREE:
./dir-B T1
./dir-B/file-A T2 yes
./dir-B/file-b T3 yes
and ask to compute a changeset with "LIMITS == { dir-B/file-A }"
The directory with tag T1 is _outside_ the scope of LIMITS and so
we'd expect the rename of that directory to be ignored. The file
with tag T3 is outside of the limits -- changes to that file should
be ignored.
The changeset generated should be the same as if we had replaced
MOD with:
LOGICAL-MOD TREE
./dir-A T1
./dir-A/file-A T2 yes
./dir-A/file-b T3
and computed a changeset without any LIMITS.
If I follow your code correctly, it will correctly record
changes to the T2 file -- but it will also include the
rename of directory T3. In general that seems like a bad
idea. Consider this variation:
ORIG TREE: tag modified
./dir-A T1
./dir-A/file-A T2
./dir-A/file-b T3
./dir-B T4
./dir-B/file-c T5
MOD TREE:
./dir-B T1
./dir-B/file-A T2 yes
./dir-B/file-b T3 yes
./dir-C T6
./dir-C/file-c T5
If your algorithm produces a changeset that records
the rename of directory T1, then it must also record
the addition of T6 and rename of T5 -- this seems like
a very strange result for "LIMITS = { dir-A/file-A }".
* case 2 -- detecting deletions
Suppose we had:
ORIG TREE: tag modified
./dir-A T1
./dir-A/dir-B T2
./dir-A/dir-B/file-A T3
MOD TREE:
./dir-C T1
./dir-C/dir-B T2
and we compute a changeset with "LIMITS == { dir-C/dir-B }"
Once again, the rename of the T1 directory should be ignored:
LOGICAL-MOD TREE:
./dir-A T1
./dir-A/dir-B T2
so the changeset records the deletion of file T3.
What's slightly tricky about that is that the limits spec
refers to a path that exists only in MOD, but a directory
that exists in both. There is no path for T3 that matches
the limit spec -- the code needs to do some work to figure
out which deletion is described here.
* case 3 -- impossible limits?
A slight variation on the previous example is:
ORIG TREE: tag modified
./dir-A T1
./dir-A/dir-B T2
./dir-A/dir-B/file-A T3
MOD TREE:
./dir-C T1
./dir-C/dir-B T2
./dir-D T4
./dir-D/file-A T3 yes
with "LIMITS == { dir-C/dir-B }"
I'm not sure that there exists any reasonable LOGICAL-MOD TREE for
this case. The stated limits enclose "half a rename" -- file-A is
gone from dir-B (inside the limits), but not deleted. It's new
location is outside the limits. Indeed, it's new location is
both outside the limits and missing from the ORIG tree.
If you don't consider that example convincing (perhaps you want to
say that in this case the resulting changeset should be empty), how
about this one:
ORIG TREE: tag modified
./dir-A T1
./dir-A/dir-B T2
./dir-A/dir-B/file-A T3
MOD TREE:
./dir-C T1
./dir-D T4
./dir-D/file-A T3 yes
with "LIMITS == { dir-C }".
Presumably the changeset should show the deletion of dir-B but not
the addition of dir-D -- yet without the addition of dir-D, what
is to become of file-A in a tree to which this change is applied?
(Note: I don't think it's impossible to come with some arbitrary
rule for what should happen in these "impossible" cases -- I do
think it highly unlikely that there is a rule that won't confuse
users. I suspect that the best answer is for mkpatch to detect
these "impossible" cases and give an error message that tries to
explain why the LIMITS are probably not as intended.)
* tricky to express limits
Consider trying to limit mkpatch to a file deletion:
ORIG TREE: tag modified
./dir-A T1
./dir-A/file-A T2
MOD TREE:
./dir-B T1
with "LIMIT == { dir-B/file-A }"
Conceptually, clearly, this should be the same as:
LOGICAL-MOD TREE:
./dir-A T1
but in practice that's quite tricky. How is mkpatch
supposed to figure out that the file named in the limits
("dir-B/file-A"), which doesn't exist in the MOD tree,
is the same file that had tag T2 in the ORIG tree?
So with that background:
> From: Denys Duchier <address@hidden>
> Tom Lord <address@hidden> writes:
> > It would be helpful to me if you would post in this thread a detailed
> > description of the semantics of limits, in this context.
> Of course (I have tried to document this in the code - if the
> documentation is insufficient, then it should be fixed - naturally
> comments are rarely sufficient to convey the big picture).
> OK, I'll give it a try (I am bit tired - just met a conf deadline):
> [description omitted]
Ok, I read that and skimmed the code.
I noticed two things although perhaps my reading is too hasty:
1) It appears that you are pruning both the ORIG and the MOD trees
according to how the limits compare to the path names to files in
either tree. I think that the limits specs should refer to paths
in the mod tree only (with a partial exception for limits naming
deleted files).
2) I don't see any code that handles the case of "impossible limits"
described above.
What I _suspect_ about this code is that in simple cases, it will
produce an unsurprising changeset, but in tricky cases, it will behave
pretty randomly.
What I _fear_ about this code (simply because I haven't seen any math
that would prove otherwise) is that it is capable of producing a
changeset which would not apply cleanly to ORIG (which would be
disasterous).
-t
- [Gnu-arch-users] fixing selected commit [commit, undo, --except], Denys Duchier, 2003/10/05
- Re: [Gnu-arch-users] fixing selected commit [commit, undo, --except], Robert Anderson, 2003/10/05
- Re: [Gnu-arch-users] fixing selected commit [commit, undo, --except], Denys Duchier, 2003/10/05
- Re: [Gnu-arch-users] fixing selected commit [commit, undo, --except], Miles Bader, 2003/10/05
- Re: [Gnu-arch-users] fixing selected commit [commit, undo, --except], duchier, 2003/10/06
- [Gnu-arch-users] Re: fixing selected commit [commit, undo, --except], Miles Bader, 2003/10/06
Re: [Gnu-arch-users] fixing selected commit [commit, undo, --except], Tom Lord, 2003/10/06
- Re: [Gnu-arch-users] fixing selected commit [commit, undo, --except], Denys Duchier, 2003/10/06
- [Gnu-arch-users] semantics of restricted mkpatch,
Tom Lord <=
- [Gnu-arch-users] Re: semantics of restricted mkpatch, Denys Duchier, 2003/10/07
- [Gnu-arch-users] Re: semantics of restricted mkpatch, Tom Lord, 2003/10/07
- [Gnu-arch-users] Re: semantics of restricted mkpatch, Tom Lord, 2003/10/07
- [Gnu-arch-users] Re: semantics of restricted mkpatch, Denys Duchier, 2003/10/07
- [Gnu-arch-users] Re: semantics of restricted mkpatch, Tom Lord, 2003/10/07
- Re: [Gnu-arch-users] Re: semantics of restricted mkpatch, Tom Lord, 2003/10/07
- Re: [Gnu-arch-users] Re: semantics of restricted mkpatch, Tom Lord, 2003/10/07