bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Standalone Info reader cannot read Info files with CR-LF EOLs


From: Eli Zaretskii
Subject: Standalone Info reader cannot read Info files with CR-LF EOLs
Date: Thu, 25 Dec 2014 17:51:46 +0200

Today I discovered that the Info reader built from the current trunk
cannot display any Info file that was produced natively on Windows (as
opposed to Info files that come from distribution tarballs, which were
produced on Unix).  The reader says it cannot find the Top node in any
such Info file.

It turned out this is because the code which stripped CR characters
from CR-LF pairs, once the file was read, was #ifdef'ed away (in
revision 5888), evidently due to a failure of a test that checks node
accessibility through tag tables without the 1000-character slack.

(I didn't find in bug-texinfo any discussion of the original problem
or the change that was made to solve it.  Neither do I see anything
pertinent in the bug database.  Did I miss something?  What or who
triggered that change?)

Anyway, the problem that #ifdef'ing tried to fix is actually a bug in
makeinfo 5.x on MS-Windows: it computes the node positions without
disregarding the CR characters, at least with the Perl that I have
here.  (Makeinfo 4.x did this correctly: it ignored the CR characters
when counting bytes for the tag tables, so stripping the CRs in Info
didn't cause any problems with tag tables.)  As result, we now have a
subtle incompatibility between Info files produced by 4.x and 5.x on
Windows, and in addition any Info file produced natively on Windows
will not be able to be displayed by the stand-alone reader.

There's code in printed_representation which ignores a CR before a LF.
But that is not enough, because all the searches for distinct labels,
such as the Node labels, anything that calls string_in_line, are now
broken, as string_in_line assumes there's nothing between the string's
end and the newline; the CR character breaks that assumption.

We could try fixing this in string_in_line, teaching it to cope with a
CR-LF pair.  But there are gobs of other uses of a literal '\n' in the
Info sources (see the commentary before convert_eols which explains
why that code was needed), which will have to be analyzed one by one
and fixed as needed, and the way to fix them might be sometimes ugly
or non-trivial.  And after all this is done, we will still have the
subtle incompatibility mentioned above between Info files produced on
Windows by makeinfo 4.x and 5.x.

Moreover, at least the Emacs Info reader strips the CR characters from
Info files before it looks up nodes in the tag table.  So the
stand-alone reader should do the same, or else some Info files will be
readable by only one of these two readers.

In short, IMO the change in r5888 is a mixed blessing, at least on
MS-Windows.

So my suggestion is:

 . reinstate the code that removed CR characters in filesys.c

 . fix texi2any to produce tag tables that assume the CR characters
   are stripped from the Info file (my reading of the code is that it
   should not count CR characters before LF for the purposes of
   count_context value; or maybe it should simply open the Info output
   in 'unix' mode)

 . fix the cr-tag-table.info test case to follow suit

 . from now on, treat any Info file whose tag table counts the CR
   characters as invalid (if someone complains)

Comments?



reply via email to

[Prev in Thread] Current Thread [Next in Thread]