gnu-misc-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Diff too difficult


From: Barry
Subject: Re: Diff too difficult
Date: 27 Dec 2005 15:34:55 -0800
User-agent: G2/0.2

Alfred M. Szmidt wrote:
> The output from diff is quite easy to understand both for humans, and
> for machines (GNU patch reads a diff, and applies it to the source
> code for example).

I did a diff on a two HTML files that differed by a CR between </body>
and </html>. The end of the diff output was (between the "------"
lines):
------
<     <p>Mr. Bush has a closet full of presidential prerogatives.
Commenting or not commenting on  ongoing investigations if he thinks he
can influence their outcome is one. Spying on his subjects is another.
Seeing what wonderful prerogatives a president has may awaken in some a
wish to be president.  In others it gives birth to the wish that we had
a different one.</p>
---
>     <p>Mr. Bush has a closet full of presidential prerogatives.  Commenting 
> or not commenting on  ongoing investigations if he thinks he can influence 
> their outcome is one. Spying on his subjects is another.  Seeing what 
> wonderful prerogatives a president has may awaken in some a wish to be 
> president.  In others it gives birth to the wish that we had a different 
> one.</p>
61d60
<
------

Identical lines were output, but many lines were missing, including the
</body> and </html> tags. This output isn't easy to read at all.

> unified diff's (-u) are
> nice for source code, side-by-side diff's (-y) are wonderful for text.

I didn't try that, but I tried sdiff. The difference between the diff
with the side-by-side flag and sdiff isn't clear to me. With sdiff, if
a line is longer than 2048 characters, it gets cut off and there's no
way to prevent it. If it's shorter, I think there's no text wrapping,
so a lot of horizontal scrolling might be necessary if you want to
provide the full text, which I do. I don't think there's mention
anywhere of how many spaces or tabs are inserted between the first and
second columns in the output, or how many are before and after the
symbols "<" or ">". I saw that some multiple consecutive spaces within
the documents were preserved in the output, which makes it even harder
for a computer to guess where a line from the first document ends and
where the symbol (when there is one) starts, or where the line from the
second document starts. I'd prefer for every byte of text to be shown,
so I'd rather not suppress multiple spaces.

Someone suggested that I try diff -u. It also gave me confusing output
on the same two files:

------
-    <p>Mr. Bush has a closet full of presidential prerogatives.
Commenting or not commenting on  ongoing investigations if he thinks he
can influence their outcome is one. Spying on his subjects is another.
Seeing what wonderful prerogatives a president has may awaken in some a
wish to be president.  In others it gives birth to the wish that we had
a different one.</p>
+    <p>Mr. Bush has a closet full of presidential prerogatives.
Commenting or not commenting on  ongoing investigations if he thinks he
can influence their outcome is one. Spying on his subjects is another.
Seeing what wonderful prerogatives a president has may awaken in some a
wish to be president.  In others it gives birth to the wish that we had
a different one.</p>

 <div style = "text-align: center; margin-top: 40px;">
 <a href = "http://creativecommons.org/licenses/by-sa/2.0/";><img style
= "border: none;" src = "../images/Creative_Commons_25.png" alt =
"Creative Commons license. Some rights reserved."></a>
@@ -58,5 +58,4 @@
 <!--#include virtual = "../cgi-bin/footer.pl" -->

 </body>
-
 </html>
------

> The GNU diff manual has a clear description on how to read both these
> formats, including the default one, and some other formats that I
> don't remeber.

Good, then maybe it will be a little easier for me to write a script
that creates the output that I want, but I still won't be happy about
having to. Even if I understand the output, I want it understood by
others without needing more than one or two lines of instructions, and
without lines being truncated.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]