[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-diffutils] bug: diff -r swaps the files it should check
From: |
Christoph Anton Mitterer |
Subject: |
Re: [bug-diffutils] bug: diff -r swaps the files it should check |
Date: |
Sat, 28 May 2011 00:23:14 +0200 |
Hi Paul.
I've seen this issue again now using diff 3.0 from Debian sid (see the
attached log file output.
Any ideas?
Cheers,
Chris.
On Fri, 2010-08-13 at 03:06 +0200, Paul Eggert wrote:
> On 08/07/10 18:50, Christoph Anton Mitterer wrote:
>
> > I found a rather strange bug in diff (Debian version 1:3.0-1) when doing
> > large backups of millions of files.
> > First I've copied the data to an external HDD, then I've did an "diff -q
> > -r >out 2> err" in order to find any errors.
>
> A wise precaution! I do that sort of thing a lot.
>
> > The strange unicode characters had their character codes inside which read
> > as follows:
> > 1CEE 1CAE
> > 1CAE 1CEE
> > ...
> > Any ideas?
>
> My idea is that you've found and reported a bug that has been in GNU
> diff ever since 2.8 was released back in 2002. Thanks very much for
> reporting this: you've undoubtedly saved others some work and
> hair-pulling.
>
> I've installed the following patch, which I hope fixes things. If you
> can test it on your data, that'd be nice. I have verified this bug on
> a much simpler test case, and this patch includes the test case to try
> to make sure that a similar bug doesn't creep back in later.
>
>
>
> >From 29ac7f191c444400c6df067647e3158a1e188ddf Mon Sep 17 00:00:00 2001
> From: Paul Eggert <address@hidden>
> Date: Thu, 12 Aug 2010 17:55:05 -0700
> Subject: [PATCH] diff: avoid spurious diffs when two distinct dir entries
> compare equal
>
> Problem reported by Christoph Anton Mitterer in:
> http://lists.gnu.org/archive/html/bug-diffutils/2010-08/msg00000.html
>
> * NEWS: Mention this bug fix.
> * src/dir.c (compare_names_for_qsort): Fall back on file_name_cmp
> if two distinct entries in the same directory compare equal.
> (diff_dirs): Prefer a file_name_cmp match when available.
> * tests/Makefile.am (TESTS): New test colliding-file-names.
> * tests/colliding-file-names: New file.
> ---
> NEWS | 5 +++++
> src/dir.c | 41 +++++++++++++++++++++++++++++++++++++++--
> tests/Makefile.am | 1 +
> tests/colliding-file-names | 20 ++++++++++++++++++++
> 4 files changed, 65 insertions(+), 2 deletions(-)
> create mode 100644 tests/colliding-file-names
>
> diff --git a/NEWS b/NEWS
> index 27b9886..1c41e9e 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -2,6 +2,11 @@ GNU diffutils NEWS -*-
> outline -*-
>
> * Noteworthy changes in release ?.? (????-??-??) [?]
>
> +** Bug fixes
> +
> + diff no longer reports spurious differences merely because two entries
> + in the same directory have names that compare equal in the current
> + locale, or compare equal because --ignore-file-name-case was given.
>
> * Noteworthy changes in release 3.0 (2010-05-03) [stable]
>
> diff --git a/src/dir.c b/src/dir.c
> index 5b4eaec..1b078ab 100644
> --- a/src/dir.c
> +++ b/src/dir.c
> @@ -166,14 +166,16 @@ compare_names (char const *name1, char const *name2)
> : file_name_cmp (name1, name2));
> }
>
> -/* A wrapper for compare_names suitable as an argument for qsort. */
> +/* Compare names FILE1 and FILE2 when sorting a directory.
> + Prefer filtered comparison, breaking ties with file_name_cmp. */
>
> static int
> compare_names_for_qsort (void const *file1, void const *file2)
> {
> char const *const *f1 = file1;
> char const *const *f2 = file2;
> - return compare_names (*f1, *f2);
> + int diff = compare_names (*f1, *f2);
> + return diff ? diff : file_name_cmp (*f1, *f2);
> }
>
> /* Compare the contents of two directories named in CMP.
> @@ -253,6 +255,41 @@ diff_dirs (struct comparison const *cmp,
> pretend the "next name" in that dir is very large. */
> int nameorder = (!*names[0] ? 1 : !*names[1] ? -1
> : compare_names (*names[0], *names[1]));
> +
> + /* Prefer a file_name_cmp match if available. This algorithm is
> + O(N**2), where N is the number of names in a directory
> + that compare_names says are all equal, but in practice N
> + is so small it's not worth tuning. */
> + if (nameorder == 0)
> + {
> + int raw_order = file_name_cmp (*names[0], *names[1]);
> + if (raw_order != 0)
> + {
> + int greater_side = raw_order < 0;
> + int lesser_side = 1 - greater_side;
> + char const **lesser = names[lesser_side];
> + char const *greater_name = *names[greater_side];
> + char const **p;
> +
> + for (p = lesser + 1;
> + *p && compare_names (*p, greater_name) == 0;
> + p++)
> + {
> + int cmp = file_name_cmp (*p, greater_name);
> + if (0 <= cmp)
> + {
> + if (cmp == 0)
> + {
> + memmove (lesser + 1, lesser,
> + (char *) p - (char *) lesser);
> + *lesser = greater_name;
> + }
> + break;
> + }
> + }
> + }
> + }
> +
> int v1 = (*handle_file) (cmp,
> 0 < nameorder ? 0 : *names[0]++,
> nameorder < 0 ? 0 : *names[1]++);
> diff --git a/tests/Makefile.am b/tests/Makefile.am
> index 6a4858c..242d501 100644
> --- a/tests/Makefile.am
> +++ b/tests/Makefile.am
> @@ -1,6 +1,7 @@
> TESTS = \
> basic \
> binary \
> + colliding-file-names \
> help-version \
> function-line-vs-leading-space \
> label-vs-func \
> diff --git a/tests/colliding-file-names b/tests/colliding-file-names
> new file mode 100644
> index 0000000..f14dce7
> --- /dev/null
> +++ b/tests/colliding-file-names
> @@ -0,0 +1,20 @@
> +#!/bin/sh
> +# Check that diff responds well if a directory has multiple file names
> +# that compare equal.
> +
> +: ${srcdir=.}
> +. "$srcdir/init.sh"; path_prepend_ ../src
> +
> +mkdir d1 d2 || fail=1
> +
> +for i in abc abC aBc aBC Abc AbC ABc ABC; do
> + echo $i >d1/$i || fail=1
> +done
> +
> +for i in ABC ABc AbC Abc aBC aBc abC abc; do
> + echo $i >d2/$i || fail=1
> +done
> +
> +diff -r --ignore-file-name-case d1 d2 || fail=1
> +
> +Exit $fail
log
Description: Text document
smime.p7s
Description: S/MIME cryptographic signature
- Re: [bug-diffutils] bug: diff -r swaps the files it should check,
Christoph Anton Mitterer <=