gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gnu-arch-users] semantics of restricted mkpatch


From: Tom Lord
Subject: [Gnu-arch-users] semantics of restricted mkpatch
Date: Tue, 7 Oct 2003 09:34:50 -0700 (PDT)


Short answr:

  Generalizing limits is hard.

  Minimally, some math needs to be done before I can merge your
  changes.  Ideally, some more math needs to be done to define more
  precisely what limits should mean and to prove that this algorithm
  actually does that.
  
  Critically: any algorithm for LIMITS must be accompanied by a
  mathematical proof that it will always either:
  
  1) refuse to produce a changeset
  
  2) produce a changeset which is guaranteed to
     apply exactly to the ORIG tree
  
  That property is critical because, without it, archive integrity can
  be lost (e.g. you can `commit' a revision that you cannot `get').
  
  And ideally: it would be nice to begin to have a mathematical model of
  how we think limits should work -- a model of what the user
  documentation would say; a model that describes how users are supposed
  to think about LIMITS.  Then, it would be nice to have a proof that
  any proposed algorithm actually implements that model.
  
  I'm concerned about this particular code because I don't have any a
  priori reason to believe it satisfies the critical property above,
  and also because, while I believe it can handle simple cases
  reasonably, I suspect it does something surprising and unwelcome in
  some tricky cases.

  Given that the ideal sets the bar very high, a practical approach
  might be to continue down the path established with the existing file
  limits:   it's better to have a restricted-capability LIMITS mechanism
  that is known to be good than to have a general capability that is
  poorly understood and possibly incorrect.


Long answer:


Selected {commit/undo/etc} all rely on a change to mkpatch.   The
change adds a new parameter, the LIMITS specification.   It compares
the ORIG and MOD trees but produces a changeset that (in some useful
sense) describes only the changes named by LIMITS.

Imprecisely, we expect that LIMITS is a list of file and directory
names.   Changes to those files or within those directories are
included in the changeset;   changes elsewhere in the tree are not.  

The hard part to all this is saying precisely what LIMITS is and what
it is supposed to mean.   The other hard part is ensuring that what we
think it is supposed to mean is actually a logically consistent idea.

Following are some examples of hard cases: cases where (at least my)
intuitive expectations of what LIMITS should do gets a bit tricky.

In these examples, I'm explaining the behavior of LIMITS in terms of
three trees: ORIG, MOD, and LOGICAL-MOD.   The idea is that:

        mkpatch ORIG MOD LIMITS

should produce exactly the same changeset as:

        mkpatch ORIG LOGICAL-MOD

One way to view the problem of limits is to see it as a need to
define `make_logical_mod' where:

        LOGICAL-MOD == make_logical_mod (ORIG,MOD,LIMITS)

with the property that:

        LOGICAL-MOD == mkpatch(ORIG,MOD,LIMITS)[ORIG]


* case 1 -- ignoring renames

  Let's suppose that we have 

        ORIG TREE:                      tag     modified

                ./dir-A                 T1      
                  ./dir-A/file-A        T2
                  ./dir-A/file-b        T3

        MOD TREE:

                ./dir-B                 T1
                  ./dir-B/file-A        T2      yes
                  ./dir-B/file-b        T3      yes


  and ask to compute a changeset with "LIMITS == { dir-B/file-A }"

  The directory with tag T1 is _outside_ the scope of LIMITS and so
  we'd expect the rename of that directory to be ignored.   The file
  with tag T3 is outside of the limits -- changes to that file should
  be ignored.

  The changeset generated should be the same as if we had replaced 
  MOD with:

        LOGICAL-MOD TREE

                ./dir-A                 T1
                  ./dir-A/file-A        T2      yes
                  ./dir-A/file-b        T3

  and computed a changeset without any LIMITS.

  If I follow your code correctly, it will correctly record 
  changes to the T2 file -- but it will also include the 
  rename of directory T3.   In general that seems like a bad 
  idea.  Consider this variation:

        ORIG TREE:                      tag     modified

                ./dir-A                 T1      
                  ./dir-A/file-A        T2
                  ./dir-A/file-b        T3
                ./dir-B                 T4
                  ./dir-B/file-c        T5

        MOD TREE:

                ./dir-B                 T1
                  ./dir-B/file-A        T2      yes
                  ./dir-B/file-b        T3      yes
                ./dir-C                 T6
                  ./dir-C/file-c        T5

   If your algorithm produces a changeset that records
   the rename of directory T1, then it must also record
   the addition of T6 and rename of T5 -- this seems like
   a very strange result for "LIMITS = { dir-A/file-A }".


* case 2 -- detecting deletions

  Suppose we had:


        ORIG TREE:                      tag     modified

        ./dir-A                         T1      
          ./dir-A/dir-B                 T2
            ./dir-A/dir-B/file-A        T3              

        MOD TREE:

        ./dir-C                         T1      
          ./dir-C/dir-B                 T2


   and we compute a changeset with "LIMITS == { dir-C/dir-B }"

   Once again, the rename of the T1 directory should be ignored:


        LOGICAL-MOD TREE:

        ./dir-A                         T1      
          ./dir-A/dir-B                 T2

   so the changeset records the deletion of file T3.

   What's slightly tricky about that is that the limits spec 
   refers to a path that exists only in MOD, but a directory
   that exists in both.   There is no path for T3 that matches
   the limit spec -- the code needs to do some work to figure 
   out which deletion is described here.



* case 3 -- impossible limits?

  A slight variation on the previous example is:

        ORIG TREE:                      tag     modified

        ./dir-A                         T1      
          ./dir-A/dir-B                 T2
            ./dir-A/dir-B/file-A        T3              

        MOD TREE:

        ./dir-C                         T1      
          ./dir-C/dir-B                 T2
        ./dir-D                         T4
          ./dir-D/file-A                T3      yes

   with "LIMITS == { dir-C/dir-B }"

   I'm not sure that there exists any reasonable LOGICAL-MOD TREE for
   this case.  The stated limits enclose "half a rename" -- file-A is
   gone from dir-B (inside the limits), but not deleted.  It's new
   location is outside the limits.   Indeed, it's new location is
   both outside the limits and missing from the ORIG tree.

   If you don't consider that example convincing (perhaps you want to
   say that in this case the resulting changeset should be empty), how
   about this one:

        ORIG TREE:                      tag     modified

        ./dir-A                         T1      
          ./dir-A/dir-B                 T2
            ./dir-A/dir-B/file-A        T3              

        MOD TREE:

        ./dir-C                         T1      
        ./dir-D                         T4
          ./dir-D/file-A                T3      yes

   with "LIMITS == { dir-C }".

   Presumably the changeset should show the deletion of dir-B but not
   the addition of dir-D -- yet without the addition of dir-D, what
   is to become of file-A in a tree to which this change is applied?

   (Note: I don't think it's impossible to come with some arbitrary 
   rule for what should happen in these "impossible" cases -- I do
   think it highly unlikely that there is a rule that won't confuse
   users.   I suspect that the best answer is for mkpatch to detect
   these "impossible" cases and give an error message that tries to
   explain why the LIMITS are probably not as intended.)


* tricky to express limits 

   Consider trying to limit mkpatch to a file deletion:

        ORIG TREE:                      tag     modified

        ./dir-A                         T1      
          ./dir-A/file-A                T2

        MOD TREE:

        ./dir-B                         T1      

   with "LIMIT == { dir-B/file-A }"

   Conceptually, clearly, this should be the same as:

        LOGICAL-MOD TREE:

        ./dir-A                         T1

   but in practice that's quite tricky.   How is mkpatch 
   supposed to figure out that the file named in the limits 
   ("dir-B/file-A"), which doesn't exist in the MOD tree, 
   is the same file that had tag T2 in the ORIG tree?


So with that background:

    > From: Denys Duchier <address@hidden>

    > Tom Lord <address@hidden> writes:

    > > It would be helpful to me if you would post in this thread a detailed
    > > description of the semantics of limits, in this context.

    > Of course (I have tried to document this in the code - if the
    > documentation is insufficient, then it should be fixed - naturally
    > comments are rarely sufficient to convey the big picture).

    > OK, I'll give it a try (I am bit tired - just met a conf deadline):
    > [description omitted]

Ok, I read that and skimmed the code.  

I noticed two things although perhaps my reading is too hasty:

1) It appears that you are pruning both the ORIG and the MOD trees
   according to how the limits compare to the path names to files in
   either tree.   I think that the limits specs should refer to paths 
   in the mod tree only (with a partial exception for limits naming 
   deleted files).

2) I don't see any code that handles the case of "impossible limits"
   described above.

What I _suspect_ about this code is that in simple cases, it will
produce an unsurprising changeset, but in tricky cases, it will behave
pretty randomly.

What I _fear_ about this code (simply because I haven't seen any math
that would prove otherwise) is that it is capable of producing a
changeset which would not apply cleanly to ORIG (which would be
disasterous).


-t





reply via email to

[Prev in Thread] Current Thread [Next in Thread]