bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnulib] addition: fstrcmp.h, fstrcmp.c


From: Paul Eggert
Subject: Re: [Bug-gnulib] addition: fstrcmp.h, fstrcmp.c
Date: 29 Jan 2003 15:27:30 -0800
User-agent: Gnus/5.0808 (Gnus v5.8.8) Emacs/20.3

Bruno Haible <address@hidden> writes:

> Objections?

I have three main objections.

First, that code has forked from an old version of diffutils, and is
missing some minor improvements in the latest diffutils version.  It
shouldn't be a fork: it should be the exact same code as that used in
diffutils.  Of course this will take some code merging, but I can take
responsibility for that.  I'll do a merge and report back in a couple
of weeks or so.


Second, as far as the external interface goes:

> extern double fstrcmp (const char *s1, const char *s2);

It would be better to have a function that returns an integer
containing the length of the common subsequence of s1 and s2.  If I
have such a function, I can easily write fstrcmp, but the converse is
not true.  Also, it's better to avoid inexact results if it's easy to
avoid them, which is the case here.  So, the interface should define a
function something like this:

/* Return the length of a common subsequence of chars of both BUF1 (of
   size BUF1SIZE) and BUF2 (of size BUF2SIZE).  The common subsequence
   need not be contiguous in each buffer.  The returned length need
   not be maximal, but should be as long as can easily be computed.  */
size_t commlen (char const *buf1, size_t buf1size,
                char const *buf2, size_t buf2size);

We can of course have fstrcmp as another function, for people who
prefer its interface as a convenience.  However, it seems to me that
fstrcmp should be a separate module that invokes commlen (which
returns an exact result) and then computes the resulting metric
inexactly.


Third, part of the credit for the authorship of the algorithm has been
removed from the code's commentary.  Here's a patch to fix this
problem of proper credit.  Can you please install this patch in your
copy, and also send it back upstream to whereever you got that code
from?  Thanks.

--- fstrcmp.c   Wed Jan 29 15:02:06 2003
+++ fstrcmp.c-fix       Wed Jan 29 15:11:12 2003
@@ -22,6 +22,10 @@
    "An O(ND) Difference Algorithm and its Variations", Eugene Myers,
    Algorithmica Vol. 1 No. 2, 1986, pp. 251-266;
    see especially section 4.2, which describes the variation used below.
+   Unless the 'minimal' flag is set, this code uses the TOO_EXPENSIVE
+   heuristic, by Paul Eggert, to limit the cost to O(N**1.5 log N)
+   at the price of producing suboptimal output for large inputs with
+   many differences.
 
    The basic algorithm was independently discovered as described in:
    "Algorithms for Approximate String Matching", E. Ukkonen,




reply via email to

[Prev in Thread] Current Thread [Next in Thread]