bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: new module 'fstrcmp', generic diff algorithm


From: Bruno Haible
Subject: Re: new module 'fstrcmp', generic diff algorithm
Date: Sat, 18 Aug 2007 12:54:33 +0200
User-agent: KMail/1.5.4

Paul Eggert wrote:
> I installed the following patch into gnulib to accomplish the first part.

Thanks. Good to see that we can finally achieve code sharing here, through
the use of parametrizable include files.

> One piece (a diffseq module) just for diffseq.h, and the other piece
> (fstrcmp, say) for the rest of the files.  This is because diffutils
> needs diffseq.h but not the other source files.

Yes, of course. I should have proposed it like this.

Let me add some trivial modifications (mostly from gettext's copy of diffseq.h):
  - Comment style.
  - Change 'heuristic' from 'int' to 'bool'.
  - Remove the 'const' from the context parameter. Needed because in the
    fstrcmp case, the NOTE_INSERT and NOTE_DELETE macros modify fields
    in the context, and an extra indirection would only cost performance:
      #define EXTRA_CONTEXT_FIELDS \
        /* The number of elements inserted or deleted. */ \
        int xvec_edit_count; \
        int yvec_edit_count;
      #define NOTE_DELETE(ctxt, xoff) ctxt->xvec_edit_count++
      #define NOTE_INSERT(ctxt, yoff) ctxt->yvec_edit_count++
  - In 'diag', keep two blocks of code in sync (lines 191 and 224).
  - Undefine the macro USE_HEURISTIC after use.


2007-08-18  Bruno Haible  <address@hidden>

        * lib/diffseq.h (struct context): Change type of 'heuristic' field to
        'bool'.
        (diag, compareseq): Remove const from the ctxt argument.
        (USE_HEURISTIC): Undefine at the end.

--- lib/diffseq.h       17 Aug 2007 23:29:24 -0000      1.3
+++ lib/diffseq.h       18 Aug 2007 10:31:38 -0000
@@ -71,28 +71,28 @@
  */
 struct context
 {
-  /* Vectors being compared. */
+  /* Vectors being compared.  */
   const ELEMENT *xvec;
   const ELEMENT *yvec;
 
-  /* Extra fields. */
+  /* Extra fields.  */
   EXTRA_CONTEXT_FIELDS
 
   /* Vector, indexed by diagonal, containing 1 + the X coordinate of the point
      furthest along the given diagonal in the forward search of the edit
-     matrix. */
+     matrix.  */
   OFFSET *fdiag;
 
   /* Vector, indexed by diagonal, containing the X coordinate of the point
      furthest along the given diagonal in the backward search of the edit
-     matrix. */
+     matrix.  */
   OFFSET *bdiag;
 
   #ifdef USE_HEURISTIC
   /* This corresponds to the diff -H flag.  With this heuristic, for
      vectors with a constant small density of changes, the algorithm is
      linear in the vectors size.  */
-  int heuristic;
+  bool heuristic;
   #endif
 
   /* Edit scripts longer than this are too expensive to compute.  */
@@ -115,6 +115,7 @@
   bool hi_minimal;
 };
 
+
 /* Find the midpoint of the shortest edit script for a specified portion
    of the two vectors.
 
@@ -123,9 +124,9 @@
    When the two searches meet, we have found the midpoint of the shortest
    edit sequence.
 
-   If FIND_MINIMAL is true, find the minimal edit script regardless
-   of expense.  Otherwise, if the search is too expensive, use
-   heuristics to stop the search and report a suboptimal answer.
+   If FIND_MINIMAL is true, find the minimal edit script regardless of
+   expense.  Otherwise, if the search is too expensive, use heuristics to
+   stop the search and report a suboptimal answer.
 
    Set PART->(xmid,ymid) to the midpoint (XMID,YMID).  The diagonal number
    XMID - YMID equals the number of inserted elements minus the number
@@ -144,7 +145,7 @@
 
 static void
 diag (OFFSET xoff, OFFSET xlim, OFFSET yoff, OFFSET ylim, bool find_minimal,
-      struct partition *part, struct context const *ctxt)
+      struct partition *part, struct context *ctxt)
 {
   OFFSET *const fd = ctxt->fdiag;      /* Give the compiler a chance. */
   OFFSET *const bd = ctxt->bdiag;      /* Additional help for the compiler. */
@@ -220,7 +221,7 @@
          OFFSET thi = bd[d + 1];
          OFFSET x0 = tlo < thi ? tlo : thi - 1;
 
-         for (x = x0, y = x - d;
+         for (x = x0, y = x0 - d;
               xoff < x && yoff < y && EQUAL (xv[x - 1], yv[y - 1]);
               x--, y--)
            continue;
@@ -390,6 +391,7 @@
     }
 }
 
+
 /* Compare in detail contiguous subsequences of the two vectors
    which are known, as a whole, to match each other.
 
@@ -401,12 +403,11 @@
    If FIND_MINIMAL, find a minimal difference no matter how
    expensive it is.
 
-   The results are recorded in the vectors files[N].changed, by storing 1
-   in the element for each line that is an insertion or deletion.  */
+   The results are recorded by invoking NOTE_DELETE and NOTE_INSERT.  */
 
 static void
 compareseq (OFFSET xoff, OFFSET xlim, OFFSET yoff, OFFSET ylim,
-           bool find_minimal, struct context const *ctxt)
+           bool find_minimal, struct context *ctxt)
 {
   ELEMENT const *xv = ctxt->xvec; /* Help the compiler.  */
   ELEMENT const *yv = ctxt->yvec;
@@ -454,7 +455,8 @@
 #undef ELEMENT
 #undef EQUAL
 #undef OFFSET
-#undef OFFSET_MAX
 #undef EXTRA_CONTEXT_FIELDS
 #undef NOTE_DELETE
 #undef NOTE_INSERT
+#undef USE_HEURISTIC
+#undef OFFSET_MAX





reply via email to

[Prev in Thread] Current Thread [Next in Thread]