gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gnu-arch-users] formal foundations for the semantics of restricted mkpa


From: duchier
Subject: [Gnu-arch-users] formal foundations for the semantics of restricted mkpatch
Date: Wed, 08 Oct 2003 14:58:59 +0200
User-agent: Gnus/5.1003 (Gnus v5.10.3) Emacs/21.3 (gnu/linux)

In this mail, I would like to set down some formal foundations for
discussing changesets in general and partial changesets in particular.
I'll start with introducing briefly some standard notions from (one
of) the formal theories of trees, and then I'll use them to speak
formally about our problem domain.  There is nothing new in this
message: I am merely setting up the stage for a formal discussion.
I'll try to contribute something mildly substantive in a subsequent
email :-)


PATHS
----------------------------------------------------------------------

let ELEMS be a set of symbols called "path elements".
a path is a word in ELEMS* i.e. a finite sequence of path elements.
I'll write PATHS instead of ELEMS*
If P1 and P2 are paths, I'll write P1/P2 for their concatenation.
P1 is said to be a prefix of P1/P2.


TREE DOMAINS
----------------------------------------------------------------------

a tree domain is a prefix closed set of (non-empty) paths.
prefix-closed means that if P1/P2 is in the set, then so is P1.  I
choose "non-empty" paths because I find them better suited to this
presentation, but nothing crucially depends on that point.


DATA(-labeled) FORESTS
----------------------------------------------------------------------

Given a set DATA of "labels", a labeling is a function mapping paths
to elements of DATA.

A DATA-labeled forest F is a pair of a tree domain Dom(F) and a
labeling Lab(F) : Dom(F) -> DATA

Obviously it is intended that the labeling capture the association of
pathname to file contents, but this interpretation is irrelevant to
my presentation.  For simplicity, from here on, I'll just say forest
instead of DATA-labeled forest.


TRASH-EXTENDED PATHS AND TREE DOMAINS
----------------------------------------------------------------------

Anticipating on changesets and the need to account for creations and
deletions, I introduce the new symbol {trash} and consider the
extended set of paths:

                    XPATHS = PATHS U {trash}/PATHS

where U stands here for set union.


CHANGESETS
----------------------------------------------------------------------

A changeset C is a pair consisting of:

  - a tree     delta: Move(C)  : XPATHS -> XPATHS
  - a labeling delta: Patch(C) : XPATHS -> DATA -> DATA

where Move and Patch share the same domain (or co-domain, see next
section), and where Move is 1-to-1.  I'll write Dom(C) as short for
Dom(Move(C)).

The motivation for {trash} is that we will represent creation of P
with:
        ({trash}/P,P) in Move(C)

and deletion of P with:

        (P,{trash}/P) in Move(C)

The point of this trick is that it allows us to have changesets where
the tree delta is 1-to-1.  The simpler scheme using ({trash},P) and
(P,{trash}) does not have this property.

I'll say that a changeset C is total on a forest F if:

        Dom(C)\{trash}/PATHS = Dom(F)

- where \ represents set difference - i.e. if C covers all paths in F
and possibly also creates some files.

To make thing more readable, I'll write !S instead of S\{trash}/PATHS


PRE- AND POST-MODED CHANGESETS
----------------------------------------------------------------------

An interesting observation is that there are really two possible kinds
of changesets: one kind where the Patch is applied after the Move
(post-moded), and one where it is applied before the Move (pre-moded).

For pre-moded changesets, Dom(Patch) = !Dom(Move)
For post-moded changesets, Dom(Patch) = !CoDom(Move)

There is an obvious correspondance between the two:

        if (Move,Patch) is a post-moded changeset,
        then (Move,Move o Patch) is the equivalent pre-moded one.

where o represents function composition.  In the latter case, I'll say
that the Patch has been "regressed" through the Move.

I believe this to be the crux of the difference between the partial
changesets that I was considering (post-moded) and the partial
changesets that Tom had in mind (pre-moded)... but I have not spoken
of partial changesets yet.


APPLYING CHANGESETS
----------------------------------------------------------------------

Given a forest F and a changeset C total on F, we define the
application C(F)=F' of C to F as follows:

if C is pre-moded:

   Dom(F') = Move(C)(Dom(F))
   Lab(F')(p) = Patch(C)(Move(C)^-1(p))(Lab(Move(C)^-1(p)))

if C is post-moded:

   Dom(F') = Move(C)(Dom(F))
   Lab(F')(p) = Patch(C)(p)(Lab(Move(C)^-1(p)))

where Move(C)^-1 is the inverse of Move(C) - it exists since Move(C)
is 1-to-1.


PARTIAL CHANGESETS
----------------------------------------------------------------------

So far, I have only considered a changeset C which is total on the
forest F to which it is to be applied, i.e. where !Dom(C)=Dom(F).

A partial changeset is one where !Dom(C) =< Dom(F), where =< stands
for the subset relation.

An extension C' of C is a patchset such that Dom(C) =< Dom(C'),
Move(C) =< Move(C') and Patch(C) =< Patch(C').


SAFE PARTIAL CHANGESETS
----------------------------------------------------------------------

I'll call C safe on F iff C' is a valid and total changeset on F,
where C' is defined by:

        Move(C')  = Move(C)  U {(p,p) | p in Dom(F)\Dom(C)}
        Patch(C') = Patch(C) U {(p,I) | p in Dom(F)\Dom(C)}

where I the identity function in DATA->DATA.  In other words C' just
extends C with identity everywhere where C was not defined.  C' needs
to be an extension, in particular Move(C') must be 1-to-1.

The requirement that Patch(C') must be 1-to-1 means that there are
many partial changesets which are not safe, namely all those where:
CoDom(Move(C))\Dom(Move(C)) has a non empty intersection with Dom(F).
This is where the closure algorithm that we discussed comes into play:
it's role is to enforce safety by including more into the changeset.


MOVE-COMPLETION OF PARTIAL CHANGESETS
----------------------------------------------------------------------

Clearly if (p,p) in Move(C), then we would want to just drop p from
the changeset's tree delta.  More generally, we would like to drop all
moves which are "operationally" implied: when we move a directory
everything below it moves along.

To account for the above, I'll consider instead the notion of
"move-completion".  The move-completion Move' of Move with respect to
a forest F is the smallest function such that:

(1)     Move =< Move'

(2)     if    (dir1,dir2) in Move'
        and   dir1/tail in Dom(F)
        and   dir1/tail not in Dom(Move)
        ---------------------------------------
        then  (dir1/tail,dir2/tail) in Move'

Also if (p,I) in Patch(C), then we would like to drop p from the
changeset's labeling delta.


MINIMAL PARTIAL CHANGESET
----------------------------------------------------------------------

A partial changeset is said to be move-minimal with respect to a
forest F if there is no smaller Move function with the same
move-completion with respect to F.  It is patch minimal if the
identity function is not in CoDom(Patch).  It is minimal if it is both
move-minimal and patch-minimal.

I think that's enough for one message :-)

Cheers,

PS: I have reread this message several times and each time, I have
fixed at least one bug, so keep a sharp eye out :-)

-- 
Denys Duchier - Équipe Calligramme - LORIA, Nancy, France





reply via email to

[Prev in Thread] Current Thread [Next in Thread]