bug-diffutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-diffutils] New diff option to compare file/directory properties


From: Andreas Gruenbacher
Subject: Re: [bug-diffutils] New diff option to compare file/directory properties
Date: Mon, 11 Oct 2010 19:11:58 +0200
User-agent: KMail/1.12.4 (Linux/2.6.31.12-0.2-desktop; KDE/4.3.5; i686; ; )

Hello Duncan,

On Monday 27 September 2010 13:29:34 Duncan Moore wrote:
>   Here's a description and source of a new diff option to compare file 
> and directory properties.
> I'd welcome any comments on the input and output formats, terminology, 
> any other suggestions etc.

I think we should extend the diff file format so that additional information 
can be included.  Patches should still remain backwards compatible as much as 
possible, though.

Filename quoting is one case where we won't be able to remain fully 
compatible: traditional patch will interpret quoted filenames wrong, so we 
probably should only quote when absolutely necessary.

We also need to define what diff files are supposed to be used for: for 
example, they currently contain file modification times.  Those timestamps are 
usually ignored when applying a diff (or left out in the first place).  It is 
not entirely clear which additional information to include by default (on the 
diff side), and which information to ignore by default (on the patch side).  I 
don't think file ownership should normally be stored in diff files, for 
example.

Probably the most widely used extended diff format today is that of git.  The 
"git diff" format always starts with a line of the form:

        "diff --git old_name new_name"

Additional information is stored below this line and above the actual diff in 
"extended headers", one item per line.  The actual diff is always in unified 
format.  The sources of the official documentation can be found here:

http://git.kernel.org/?p=git/git.git;a=blob;f=Documentation/diff-generate-
patch.txt

The features that the "git diff" format adds include:

 * Filename quoting using C string literals for filenames which require
   quoting.

 * Distinguishes between empty files and non-existing files.

 * Support for symlinks.

 * Support for file modes.  (The diff format includes the full file mode
   including the file type and file permission bits, but git itself only
   preserves the owner execute bit of regular files.)

 * Support for renames and copies without ending up with patches that add and
   remove the entire file.

 * Support for binary files and binary deltas is included.  (The "payload" is
   base85 encoded in this case.)

 * The SHA-1 hash of the file before and after a change is included.

The git people got many things right in this format. Unfortunately, there are 
a few peculiarities which make it difficult to use this as the "standard" 
extended diff format for diff/patch:

 * The filenames in the "diff --git" header line are of the form "a/file1" and
   "b/file2", with configurable "a/" and "b/" prefixes.  When a diff is a
   rename or copy, "copy from" and "copy to" or "rename from" and "rename to"
   extended headers are included with the old and new filename --- but those
   filenames are of the form "file1" and "file2" and *do not* include
   prefixes.  This breaks the pathname stripping logic (patch's -p option).

 * When a patch modifies the same file more than once, the behavior
   standardized in POSIX is to accumulate all changes in the result.  In the
   "git diff" format, the behavior is different: all the "file1" files refer
   to the state before applying the entire patch, and all the "file2" files
   refer to the state after applying the patch.  This is useful for patches
   which swap two files, for example, but the default GNU patch behavior still
   cannot be changed to match this.

 * The filenames in the "diff --git" line are sometimes significant.  Space
   characters in filenames are not quoted and so this line sometimes cannot
   be parsed correctly.  GNU diff could quote spaces in this case without
   compatibility problems with git, though.  (I have not managed to convince
   the git maintainers to repair this relatively minor deficiency in git.)

The "git diff" format is a good starting point nevertheless, and we should 
take a really close look.

> --compare=[content,time,mode,size,owner,group,all,objects]
> This option provides a number of file properties to compare, for normal 
> and both context output styles. Keyword 'content' means the file 
> contents, [...]

I don't understand.  There are two things to distinguish here: what diff looks 
at in order to decide if a file is considered "identical", and what 
information diff includes in its output.  Which refers to which?

Character and block special files don't make sense for me to include in a 
diff: they are not portable by design, and make little to no sense to apply in 
a different context (on a different machine or even after a driver load or 
removal on the same machine).

Directories are traditionally created and removed as needed by GNU patch.  I'm 
not sure if it makes sense to include them in diffs at all, but if we want to 
include them, doing so in a backwards compatible way (so that current 
implementations of patch will ignore directories) will not be easy.

Also, as mentioned above, I don't think that file ownership should be included 
in a diff.

> Any difference in content or file properties causes the specified 
> property information to be written to a modified context style header. 
> Note that the keyword 'time' is not written - this saves space, and is 
> more compatible with the existing context header line format.
> 
>    % diff -u --compare=all aaa bbb
>    --- aaa 2010-09-23 20:51:06.984375000 +0100 mode=-rw-r--r-- size=4 
> owner=Duncan group=None
>    +++ bbb 2010-09-23 20:51:03.015625000 +0100 mode=-rwxr--r-- size=4 
> owner=Duncan group=None
>    @@ -1 +1 @@
>    -111
>    +222

Sorry, but stuffing all this information in the "---" and "+++" header lines 
is just horrible.

Including information which did not actually change also doesn't make sense to 
me.

Thanks,
Andreas



reply via email to

[Prev in Thread] Current Thread [Next in Thread]