gnu-misc-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [arch-announce] what proofs look like (relates to why i find you guy


From: Tom Lord
Subject: Re: [arch-announce] what proofs look like (relates to why i find you guys ridiculous to try to work with in spite of the fact we both work on "open source")
Date: Fri, 9 Aug 2002 20:21:37 -0700 (PDT)

Since you maintain patchutils, I have long looked forward to working
my way over towards working with you on this.  Your CTO has, so far,
foobarred any possibility of that.

-t




!        How Whole Tree Patching Works


  This note contains a formalized description of how whole tree
  patch sets handle tree rearrangements.

  There are two limitations: the math of the formalization is "pretty
  good", but could use a little more precision for a final draft
  patch set standard;  only the (interesting) problem of tree
  rearrangements is addressed -- other details of patching aren't
  addressed (yet).

  This is long and rather dry with lots of equations -- so it isn't
  for all readers.  

  On the other hand, there's an important theorem proven here
  ({theorem: rearrangement inventory equivalence}) whose significance
  is that it proves that our patch set format contains enough
  information for exact patching (in spite of not containing complete
  inventories for the `orig' and `mod' trees).

  Additionally, if you want to try formally defining operations such
  as patch set composition, this document contains enough formalism
  for you to work out how to compose tree rearrangements.

* overview

  In whole tree patching, we have three filesystem trees: `orig',
  `mod', and `target'.  

  First, we'll compare `orig' and `mod' to produce a data stream
  called a "patch set" the describes the changes between them.

  Second, we'll "apply" the patch set to the target -- which means
  we want to make the same set of changes to the target tree.

  Two difficulties arise: one, the `target' tree may not be identical
  to the `orig' tree, in which case it isn't a priori clear what it
  means to apply the "same changes"; two, we want to include file tree
  structural rearrangements among the changes we record when computing
  patch sets and replay while applying them.


* the focus of this document is on tree-structure patching

** patch sets have two parts

  A whole tree patch set has two parts:

        *) a set of __individual file changes__
        *) a __tree rearrangement__


** individual file changes

  Patch set contents for individual programs are created by programs
  such as `diff' and applied by programs such as `patch'.  
  The details are fairly banal and aren't important to this note.


** tree rearrangements

  Tree rearrangements are based on comparing `orig' and `mod' to see
  how files and directories have been rearranged, and 
  applying that "same" transformation to `target'.

  The problem is defining what "same" means when the target tree
  does not precisely match the original tree.  So, exploring that
  topic is what comes next:


* what's a tree structure (formally)?

  Let's do some intuitive math.

** tag sets

<<<
        DirTags         :=      "the set of directory tags"
        FileTags        :=      "the set of file tags"
        SymlinkTags     :=      "the set of symlink tags"

        AllTags         :=      DirTags union FileTags union SymlinkTags
>>>

  The `StudlyCaps' words are set names.  The `:=' means "is defined to
  be".  The definition is in quotes which means that it's an 
  informal definition -- you're supposed to be able to fill in the
  details for yourself (e.g., you know what a tag string is because
  you are familiar with `arch').

** Tag Set Disjointness Axiom

<<<
        Axiom:

                   DirTags int FileTags
                == DirTags int SymlinkTags
                == FileTags int SymlinkTags
                == -empty-
>>>

  `int' is the set intersection operator. `-empty-' is the
  empty set.  Along with the axiom, we can define some predicates:

*** tag predicates

<<<
        dir_tag?(x)     :=      x member DirTags
        file_tag?(x)    :=      x member FileTags
        symlink_tag?(x) :=      x member SymlinkTags
>>>


** relative path names

<<<
        RelPaths        :=      "the set of simplified down-relative paths"
>>>

  `Simplified' means that the paths do not contain `./' or `/..'
  components.  `down-relative' means that the the paths do not
  begin with '/' or `..'.


** so what's a tree (structure)?

  Each of our three trees is characterized by an _inventory_ function
  whose type we approximately know:

<<<
        inventory_orig          :       AllTags <-> RelPaths
        inventory_mod           :       AllTags <-> RelPaths
        inventory_target        :       AllTags <-> RelPaths
>>>

  (The notation means that each `inventory_*' is an invertible
  function between a subset of `AllTags' and a subset of `RelPaths'.)

  The precise domains and ranges of those functions are of 
  interest:

<<<
        orig_tags       :=      Dom (inventory_orig)
        mod_tags        :=      Dom (inventory_mod)
        target_tags     :=      Dom (inventory_target)
        
        orig_paths      :=      Rng (inventory_orig)
        mod_paths       :=      Rng (inventory_mod)
        target_paths    :=      Rng (inventory_target)
        
>>>


** comparing two trees


  Let's start to compare `orig' and `mod' in order to produce a 
  patch set.

*** interesting sets while comparing trees

  A few interesting sets are:

<<<
        removed_tags    :=      Dom (inventory_orig) - Dom (inventory_mod)
        added_tags      :=      Dom (inventory_mod) - Dom (inventory_orig)
        common_tags     :=      Dom (inventory_mod) int Dom (inventory_orig)
>>>

  The sets `added_' and `removed_tags' are the deleted and new files,
  symlinks and directories.  The set `common_tags' are files found
  in both `orig' and `mod' (whether moved or not).

*** lemma: disjointness of tree comparison sets

<<<
        Lemma:
                   removed_tags int added_tags
                == removed_tags int common_tags
                == added_tags int common_tags
                == -empty-

        Proof:

        From the definitions, we know that:

                [i] removed_tags == A - B
                [ii] added_tags == B - A
                [iii] common_tags == A int B

        From that it follows that:

                [iv] removed_tags int B == empty        (from [i])
                [v] added_tags int A == empty           (from [ii])

                [vi] removed_tags is_subset A           (from [i])
                [vi] added_tags is_subset B             (from [ii])
                [vii] common_tags is_subset A           (from [iii])
                [viii] common_tags is_subset B          (from [iii])

        From that it follows that __each of the following sets
        is `-empty-'__:

                [ix] removed_tags int added_tags        (from [iv] + [vi])
                [x] removed_tags int common_tags        (from [iv] + [viii])
                [xi] added_tags int added_tags          (from [v] + [vii])

        QED
>>>

*** the set of moved files, directories, and symlinks

  Another set of interest is:

<<<
        moved_tags      := { x | 
                                 (x is_in common_tags)
                                and
                                 (inventory_orig(x) != inventory_mod(x))
                           }
>>>


*** lemma: disjointedness of interesting tree comparison sets

<<<
        Lemma:
                   removed_tags int added_tags
                == removed_tags int moved_tags
                == added_tags int moved_tags
                == -empty-

        Proof:

        From the definitions, we know that:

                moved_tags is_subset common_tags

        thus this lemma follows trivially from
        the lemma "disjointness of tree comparison sets".

        QED
>>>


*** the two tree-relative sets of interesting tags

  We just constructed three sets of tags: `added_', `removed_', and
  `moved_' tags.  Not all of those tags are present in both the `orig' 
  and `mod' trees.  Just these:

<<<
        orig_rearrangement_tags :=      removed_tags union moved_tags
        mod_rearrangement_tags  :=      added_tags union moved_tags
>>>

*** theorem: rearrangement set equivalences

  This theorem is useful because it means that given only
  the two `_rearrangement_tags' sets, we can recover the 
  three `added_/remove_/moved_' sets.

<<<
        Theorem:

          [I] moved_tags == orig_rearrangement_tags int mod_rearrangement_tags

          [II] removed_tags == orig_rearrangement_tags - mod_rearrangement_tags
  
          [III] added_tags == mod_rearrangement_tags - orig_rearrangement_tags

        Proof:

          With shorthand:

                A := added_tags
                M := moved_tags
                R := removed_tags

                G := orig_rearrangement_tags
                D := mod_rearrangement_tags


          of [I]...

           [i] (from defs)

                   (G int D)
                == ((R union M) int (A union M))

  
        
           [ii] (from set algebra)

                == (A int (R union M)) union (M int (R union M))
                == (A int R) union (A int M) union M

           [iv] (from {lemma: disjointedness of interesting tree comparison 
sets})

                == -empty- union -empty- union M
                == M


          of [II] (and [III] by symmetry):

           [i] (from defs)

                   (G - D)
                == (R union M) - (A union M)

           [ii] (from set algebra)

                == R - (A union M)
                == (R - A) - M

           [iii] (from {lemma: disjointedness of interesting tree
                  comparison sets})

                == R - M
                == R

        QED

>>>



*** interesting functions while comparing trees

  Given those sets, we can define some new sets functions by restricting
  the domains of the `inventory_' functions:

<<<

        rearrangement_inventory_orig :=
                restrict(inventory_orig, orig_rearrangement_tags)

        rearrangement_inventory_mod :=
                restrict(inventory_mod, mod_rearrangement_tags)

>>>


*** theorem: rearrangement inventory equivalence

  Here is an interesting and really important equation:

<<<
        Theorem:

                inventory_mod(x)
  
             == by-cases:

                -undefined- if x is_in removed_tags
                rearrangement_inventory_mod(x) if x is_in moved_tags
                rearrangement_inventory_mod(x) if x is_in added_tags
                inventory_orig(x) otherwise
  

                inventory_orig(x)
  
             == by-cases:

                -undefined- if x is_in added_tags
                rearrangement_inventory_orig(x) if x is_in moved_tags
                rearrangement_inventory_orig(x) if x is_in removed_tags
                inventory_mod(x) otherwise
  

        Proof:

           [omitted for brevity but simple]
>>>

  
  What that tells us is that if we have `rearrangement_inventory_mod',
  the `added_/removed_/moved_tags' sets, and `inventory_orig', then
  we can reconstruct `inventory_mod'.

  Symmetrically, with `rearrangement_inventory_orig', the `added_/...'
  sets, and `inventory_mod', we can reconstruct `inventory_orig'.

  What that means in english is that if we have a patch set plus a
  copy of either `orig' or `mod', and the patch contains rearrangement
  inventories for the `orig' and `mod' trees, then we have all the
  information we need for forwards or backwards exact tree
  rearrangements.  (We know we have all the information we need
  because the theorem above tells us how to reconstruct all of the
  relevant information there is from just the inventory of the tree we
  have plus the contents of the patch set.)


* what goes in a patch set?

  Patch set syntax (described elswhere) includes `rename' directives
  and `change' directives.  Collectively, these include (as a subset
  of the useful information they contain), specifications
  of the `rearrangement_inventory_orig' and
  `rearrangement_inventory_mod' tables.

  That design is at least _plausible_ since (as we just proved) it is
  sufficient for forwards and backwards exact tree rearrangements.

  Whether or not it's sufficient for inexact patching is a matter
  of subjective judgement.   The semantics of inexact patching 
  haven't been formally (only informally) specified so far.  Is it
  a usefully specified semantics?


* what's missing?

  Still needed are formal models of (in terms of impact on inventory and 
  file contents) inexact tree rearrangements and individual file change
  application.


-----
Offer feedback/support:
http://svcs.affero.net/rm.php?m=lord%40regexps%2Ecom&ls=towards%20formal%20patch%20set%20semantics&ll=arch-dev@regexps.com

_______________________________________________
arch-dev mailing list
arch-dev@regexps.com
http://www.regexps.com/mailman/listinfo/arch-dev


_______________________________________________
arch-announce mailing list
arch-announce@regexps.com
http://www.regexps.com/mailman/listinfo/arch-announce






reply via email to

[Prev in Thread] Current Thread [Next in Thread]