bug-diffutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-diffutils] bug: diff -r swaps the files it should check


From: Christoph Anton Mitterer
Subject: Re: [bug-diffutils] bug: diff -r swaps the files it should check
Date: Sat, 28 May 2011 00:23:14 +0200

Hi Paul.

I've seen this issue again now using diff 3.0 from Debian sid (see the
attached log file output.


Any ideas?

Cheers,
Chris.

On Fri, 2010-08-13 at 03:06 +0200, Paul Eggert wrote:
> On 08/07/10 18:50, Christoph Anton Mitterer wrote:
> 
> > I found a rather strange bug in diff (Debian version 1:3.0-1) when doing
> > large backups of millions of files.
> > First I've copied the data to an external HDD, then I've did an "diff -q
> > -r >out 2> err" in order to find any errors.
> 
> A wise precaution!  I do that sort of thing a lot.
> 
> > The strange unicode characters had their character codes inside which read
> > as follows:
> > 1CEE      1CAE
> > 1CAE      1CEE
> > ...
> > Any ideas?
> 
> My idea is that you've found and reported a bug that has been in GNU
> diff ever since 2.8 was released back in 2002.  Thanks very much for
> reporting this: you've undoubtedly saved others some work and
> hair-pulling.
> 
> I've installed the following patch, which I hope fixes things.  If you
> can test it on your data, that'd be nice.  I have verified this bug on
> a much simpler test case, and this patch includes the test case to try
> to make sure that a similar bug doesn't creep back in later.
> 
> 
> 
> >From 29ac7f191c444400c6df067647e3158a1e188ddf Mon Sep 17 00:00:00 2001
> From: Paul Eggert <address@hidden>
> Date: Thu, 12 Aug 2010 17:55:05 -0700
> Subject: [PATCH] diff: avoid spurious diffs when two distinct dir entries 
> compare equal
> 
> Problem reported by Christoph Anton Mitterer in:
> http://lists.gnu.org/archive/html/bug-diffutils/2010-08/msg00000.html
> 
> * NEWS: Mention this bug fix.
> * src/dir.c (compare_names_for_qsort): Fall back on file_name_cmp
> if two distinct entries in the same directory compare equal.
> (diff_dirs): Prefer a file_name_cmp match when available.
> * tests/Makefile.am (TESTS): New test colliding-file-names.
> * tests/colliding-file-names: New file.
> ---
>  NEWS                       |    5 +++++
>  src/dir.c                  |   41 +++++++++++++++++++++++++++++++++++++++--
>  tests/Makefile.am          |    1 +
>  tests/colliding-file-names |   20 ++++++++++++++++++++
>  4 files changed, 65 insertions(+), 2 deletions(-)
>  create mode 100644 tests/colliding-file-names
> 
> diff --git a/NEWS b/NEWS
> index 27b9886..1c41e9e 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -2,6 +2,11 @@ GNU diffutils NEWS                                    -*- 
> outline -*-
>  
>  * Noteworthy changes in release ?.? (????-??-??) [?]
>  
> +** Bug fixes
> +
> +  diff no longer reports spurious differences merely because two entries
> +  in the same directory have names that compare equal in the current
> +  locale, or compare equal because --ignore-file-name-case was given.
>  
>  * Noteworthy changes in release 3.0 (2010-05-03) [stable]
>  
> diff --git a/src/dir.c b/src/dir.c
> index 5b4eaec..1b078ab 100644
> --- a/src/dir.c
> +++ b/src/dir.c
> @@ -166,14 +166,16 @@ compare_names (char const *name1, char const *name2)
>         : file_name_cmp (name1, name2));
>  }
>  
> -/* A wrapper for compare_names suitable as an argument for qsort.  */
> +/* Compare names FILE1 and FILE2 when sorting a directory.
> +   Prefer filtered comparison, breaking ties with file_name_cmp.  */
>  
>  static int
>  compare_names_for_qsort (void const *file1, void const *file2)
>  {
>    char const *const *f1 = file1;
>    char const *const *f2 = file2;
> -  return compare_names (*f1, *f2);
> +  int diff = compare_names (*f1, *f2);
> +  return diff ? diff : file_name_cmp (*f1, *f2);
>  }
>  
>  /* Compare the contents of two directories named in CMP.
> @@ -253,6 +255,41 @@ diff_dirs (struct comparison const *cmp,
>            pretend the "next name" in that dir is very large.  */
>         int nameorder = (!*names[0] ? 1 : !*names[1] ? -1
>                          : compare_names (*names[0], *names[1]));
> +
> +       /* Prefer a file_name_cmp match if available.  This algorithm is
> +          O(N**2), where N is the number of names in a directory
> +          that compare_names says are all equal, but in practice N
> +          is so small it's not worth tuning.  */
> +       if (nameorder == 0)
> +         {
> +           int raw_order = file_name_cmp (*names[0], *names[1]);
> +           if (raw_order != 0)
> +             {
> +               int greater_side = raw_order < 0;
> +               int lesser_side = 1 - greater_side;
> +               char const **lesser = names[lesser_side];
> +               char const *greater_name = *names[greater_side];
> +               char const **p;
> +
> +               for (p = lesser + 1;
> +                    *p && compare_names (*p, greater_name) == 0;
> +                    p++)
> +                 {
> +                   int cmp = file_name_cmp (*p, greater_name);
> +                   if (0 <= cmp)
> +                     {
> +                       if (cmp == 0)
> +                         {
> +                           memmove (lesser + 1, lesser,
> +                                    (char *) p - (char *) lesser);
> +                           *lesser = greater_name;
> +                         }
> +                       break;
> +                     }
> +                 }
> +             }
> +         }
> +
>         int v1 = (*handle_file) (cmp,
>                                  0 < nameorder ? 0 : *names[0]++,
>                                  nameorder < 0 ? 0 : *names[1]++);
> diff --git a/tests/Makefile.am b/tests/Makefile.am
> index 6a4858c..242d501 100644
> --- a/tests/Makefile.am
> +++ b/tests/Makefile.am
> @@ -1,6 +1,7 @@
>  TESTS = \
>    basic \
>    binary \
> +  colliding-file-names \
>    help-version       \
>    function-line-vs-leading-space \
>    label-vs-func      \
> diff --git a/tests/colliding-file-names b/tests/colliding-file-names
> new file mode 100644
> index 0000000..f14dce7
> --- /dev/null
> +++ b/tests/colliding-file-names
> @@ -0,0 +1,20 @@
> +#!/bin/sh
> +# Check that diff responds well if a directory has multiple file names
> +# that compare equal.
> +
> +: ${srcdir=.}
> +. "$srcdir/init.sh"; path_prepend_ ../src
> +
> +mkdir d1 d2 || fail=1
> +
> +for i in abc abC aBc aBC Abc AbC ABc ABC; do
> + echo $i >d1/$i || fail=1
> +done
> +
> +for i in ABC ABc AbC Abc aBC aBc abC abc; do
> + echo $i >d2/$i || fail=1
> +done
> +
> +diff -r --ignore-file-name-case d1 d2 || fail=1
> +
> +Exit $fail

Attachment: log
Description: Text document

Attachment: smime.p7s
Description: S/MIME cryptographic signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]