bug-diffutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-diffutils] New diff option to compare file/directory properties


From: Duncan Moore
Subject: Re: [bug-diffutils] New diff option to compare file/directory properties
Date: Tue, 19 Oct 2010 12:59:08 +0100
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.2.9) Gecko/20100915 Thunderbird/3.1.4


On 15/10/2010 14:38, Andreas Gruenbacher wrote:
On Thursday 14 October 2010 17:47:38 Duncan Moore wrote:
On 11/10/2010 18:11, Andreas Gruenbacher wrote:
I think we should extend the diff file format so that additional
information can be included.  Patches should still remain backwards
compatible as much as possible, though.
I'm not sure we can have existing implementations of 'patch' working
with the new 'diff --compare=...' output format, unless we have all the
differing properties on a single line,
We can relatively easily put additional information before the actual diff,
like git does.  Existing patch implementations like GNU patch which ignore
leading information which they don't understand will continue to apply diffs
to regular files just fine.

OK. If GNU 'patch' ignores leading information it doesn't understand, this makes it very much easier to attain backwards compatibility with it.

Of course, if the --compare=... option isn't used, then there is no
change to the 'diff' output, and no backwards compatibility issues.
I think that some information like file permissions and symlinks should be
included in patches by default.

For information like file permissions, having this by default would produce a lot of information that won't be of much use to most users of 'diff'.

Firstly, I'll just state some of the points that the updates were
intended to address, taken from the 'diff' info manual:

Section 18.1
It is important to note that section 18.1 is about "some ideas for improving
GNU diff and patch".  I agree with many of those ideas, but this section
probably shouldn't be treated as if it was set in stone.

Of course not, but in this instance they seem pretty reasonable. Also, I wanted to make it clear what I'm trying to achieve, because I think you're approaching it from a different angle.

The aim was to carry this principal over for other file
properties, so that the user can selectively compare any file
properties, either with or without the traditional file contents
difference. This would be separate from it's use in creating patch
files, although the output format would be the same.
I don't see the point in excluding file content differences.  What would be a
reasonably general use case for this?

Say you've been changing file permissions in a directory and want to check that you've done it properly. Excluding file content differences from a 'diff' will show more clearly what permissions you've changed. It will also be quicker.

I don't think introducing all those funny keywords is the right way to go.
Most people will not use any of them, and GNU diff should probably still
include most or all additional information available as long as it doesn't
cause backwards compatibility problems.

It's sounding like you might want 'diff' to generate any property information that will be of use to 'patch' by default (i.e without introducing a new option). That will generate a lot of information that most of the time will be of no use to users of 'diff'. I don't think that's a good idea. Or do you mean just having a --compare option with no keywords? I had envisaged that it would usually be used like --compare=all or --compare=basic to select predefined sets of properties, with the finer tuning of properties available when it's required.

Also, git diff uses 'old' and 'new' in the output, which are concepts that
are not /directly/ relevant to 'diff' itself.
How are they not directly relevant?

Because 'diff' doesn't know anything about the history of the files. For example, the files could be given in the order 'new' 'old'. Or the files may share a common ancestor, so there is no 'old' and 'new'.

   Diff also basically compares two files;
feel free to call them 'file1' and 'file2' or 'left' and 'right' or whatever
instead of 'old' and 'new'.

OK, I incorrectly assumed you were after compatibility with the git diff format. These are better than 'old' and 'new', but 'diff' already identifies the files with '-' '+' in unified context style, '*' '-' in context style and '<' '>' in normal style, so it seems reasonable to use the same characters in some way to identify the new property information.

Character and block special files don't make sense for me to include in a
diff: they are not portable by design, and make little to no sense to
apply in a different context (on a different machine or even after a
driver load or removal on the same machine).
Some special files may not be useful for 'patch', but 'diff' is not a
tool for just supplying input for 'patch'. As a tool for saying what's
different between files, I think 'diff' should be reporting it if these
types of files are present.
It does now already.

I meant that 'diff' should report their differing properties (if it's been asked to), not just report that the special files exist.

   IMO it would be sufficient for GNU diff to report the
file type and name either when a file is found only in one tree or when a file
exists in both trees but refers to different file types or devices.  Device
numbers should probably not end up in patches; this is not an archive format.

For default 'diff' behaviour I think that that would be reasonable, but not if 'diff' has been specifically asked to show property information for file objects.

[...]
This means that the example above would become:

    % diff -u --compare=all aaa bbb
    --- aaa
    +++ bbb
    --- time: 2010-09-23 20:51:06.984375000 +0100
    +++ time: 2010-09-23 20:51:03.015625000 +0100
    --- mode: rw-r--r--
    +++ mode: rwxr--r--
    @@ -1 +1 @@
    -111
    +222
No way: overloading the meaning of the "---" and "+++" headers would
completely break every existing implementation of patch.

How about:

   % diff -u --compare=all aaa bbb
   --- aaa
   +++ bbb
   -- time: 2010-09-23 20:51:06.984375000 +0100
   ++ time: 2010-09-23 20:51:03.015625000 +0100
   -- mode: -rw-r--r--
   ++ mode: -rwxr--r--
   @@ -1 +1 @@
   -111
   +222


Non-existent files would give this sort of output:
[...]
Please have a look how GNU diff uses Epoch timestamps and how cvs / svn / git
use /dev/null for creates and deletes.

Using an epoch timestamp to mark a non-existent file is something I wanted to avoid. Giving a dummy timestamp to a these files would be inconsistent with other file properties which wouldn't be given dummy values; and what happens if timestamp isn't one of the properties output? Using "+++ /dev/null" to identify non-existent files is better than using timestamps. I think "+++ <filename> non-existent", is more explicit, but if dev/null has traditionally been used in 'patch', then this may be preferable.

Symbolic links are normally dereferenced and treated like regular files,
but a --no-dereference option could be used to treat symbolic links like
the other special files:
I believe this is a mistake;

Can you clarify what you believe the mistake is - the way GNU diff treats symbolic links, the proposed --no-dereference option, or both?

  the way how GNU diff currently treats symlinks
(as regular files) is likely to lead to undesired results.  At the very least,
GNU diff should print a warning.

Symlinks never have any "file content" so the file the symlink points to could
be treated as the "file content".

I'm not quite clear what you're suggesting here for the --compare=... option. When used with a --no-dereference option, that it should use the contents of the file pointed to? That seems to be mixing dereferenced and non-dereferenced information to me. I think that the symlink should either be dereferenced or not, and any file properties used consistently. If the user is only concerned with the symlink itself and selects --no-dereference in 'diff', then the contents of the file pointed to are of no relevance to 'patch'.

Thanks



reply via email to

[Prev in Thread] Current Thread [Next in Thread]