bug-diffutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug-diffutils] bug#16608: bug#16608: Bug#737180: diffutils: diff exit s


From: Vincent Lefevre
Subject: [bug-diffutils] bug#16608: bug#16608: Bug#737180: diffutils: diff exit status is 2 instead of 1 on binary files that differ (fwd)
Date: Fri, 31 Jan 2014 23:27:41 +0100
User-agent: Mutt/1.5.21-6305-vl-r59709 (2013-04-16)

On 2014-01-31 08:02:32 -0800, Paul Eggert wrote:
> Santiago Vila wrote:
> >   Exit status is 0 if inputs are the same, 1 if different, 2 if trouble.
> 
> For Diffutils "trouble" means "the output doesn't correctly represent the
> difference between the two input files".  Perhaps that's not what is wanted
> in this situation with a diff wrapper that supports some binary files but
> not others, but I expect that there are other uses where the current
> behavior is wanted.

POSIX actually specifies the behavior:

  http://pubs.opengroup.org/onlinepubs/9699919799/utilities/diff.html

  EXIT STATUS

    The following exit values shall be returned:

     0
        No differences were found.
     1
        Differences were found.
    >1
        An error occurred.

and earlier:

  Diff Binary Output Format

    In the POSIX locale, if one or both of the files being compared
    are not text files, it is implementation-defined whether diff uses
    the binary file output format or the other formats as specified
    below. The binary file output format shall contain the pathnames
    of two files being compared and the string "differ".

  [The formats specified below are those for text files.]

So, for the output when diffing binary files, POSIX gives the choice
between a specific format (binary output format) or one of the formats
defined for text files. It is nowhere regarded as an error.

Note that Solaris diff uses the binary output (like GNU diff), but
follows POSIX for the exit status, i.e. exits with 1.

> The users in question could change their wrapper to use "diff -a",
> or to look for the message they don't think is trouble and to ignore
> exit status 2 in that case.

"diff -a" is a bad choice because outputting binary data can corrupt
the terminal settings (or for big binary files, it can distract the
user). And there are several problems with the second solution:

1. If one wants to run "diff" only once, one needs to capture stdout
and stderr in shell variables. I'm not sure whether this is possible.
Otherwise one needs to redirect diff's stdout and stderr to temporary
files, and this is rather ugly (one needs to have some security checks
and also check whether there is an error due to the redirection itself,
but in this case, it is not possible to know whether the cause of the
error is diff itself or the redirection).

2. One can also run diff normally, and in case of exit status 2, run
it again, capturing the output, and check it. Something like:

  diff "$@"
  err=$?
  if [[ $err -eq 2 ]] then
    out=$(diff "$@" 2>/dev/null)
    [[ $out = "Binary files "*" differ" ]] && err=1
  fi
  exit $err

But there is an obvious race condition: it can happen that for the
first invocation, there was a real error with diff, so that one wants
to exit with error code 2, but the second invocation may succeed
(because the state of the system changed), and on binary files, one
would exit with 1 instead of 2.

Moreover it is not always possible to run diff twice on the same
files, in case of special files (such as /proc/self/fd/... from
process substitution).

And it can be inefficient.

And whether (1) or (2) is chosen, in case of recursive diff, it's
completely impossible to know what is going on: there may be a
"Binary files ... differ" *and* a real error.

> In theory we could have different exit statuses for Diffutils, one
> for each sort of trouble, but I'm not sure that's a road we want to
> head down.

That would still not be acceptable w.r.t. POSIX. The default should
conform to POSIX, but an option to exit with 2 instead of 1 in case
of binary files, for those who would need this behavior. But I'm
wondering whether this is really useful: probably not when the output
is read by a human, and if used by another process, this process
should check whether its input has the expected format anyway.

-- 
Vincent Lefèvre <address@hidden> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)





reply via email to

[Prev in Thread] Current Thread [Next in Thread]