bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Standalone Info reader cannot read Info files with CR-LF EOLs


From: Eli Zaretskii
Subject: Re: Standalone Info reader cannot read Info files with CR-LF EOLs
Date: Fri, 26 Dec 2014 23:52:34 +0200

> Date: Fri, 26 Dec 2014 16:48:00 +0000
> From: Gavin Smith <address@hidden>
> Cc: Texinfo <address@hidden>
> 
> I discovered this problem with the "gnucobpg.info" file that is part
> of GNU Cobol (downloadable at
> http://opencobol.add1tocobol.com/guides/), which has many CR-LF line
> endings (but not consistently). I don't know exactly how this file was
> generated - the file preamble says
> 
> This is gnucobpg.info, produced by makeinfo version 4.8 from
> gnucobpg.texi.

It's a broken file.  I have no idea how they produced it, but it
wasn't by stock makeinfo 4.8 on Windows, because that version already
did both count byte offsets in makeinfo disregarding the CR
characters, and had the EOL conversion function in the Info reader.  I
just checked its code, which I still have on my disk.

> - anyway, I had the problem mentioned that I found I couldn't access
> later nodes in the file. I tested just now with info 4.13 and wasn't
> able to access the "Alphabet-Name-Clause" or anything later in the
> file. That's the only Info file I remember encountering containing
> many CR bytes.

Its tag table accounts for the CR characters, which is wrong.  That's
why the Info reader from 4.13 cannot read it correctly.  And that's
exactly what will happen with Info files created by makeinfo 5.2 when
someone tries to read them with Info from 4.13.

Moreover, the same problem will happen with the Emacs Info reader.
Emacs removes the CR characters when it reads files into buffers (any
files, not just Info files), so it must have the tag table with
offsets that disregard the CRs.

> Since this claims to be produced by the 4.8 version (not 5.x)  whether
> the CR characters are counted in the tag table must depend on other,
> unknown factors.

I don't think we can or should try fixing broken Info files.  We
certainly shouldn't introduce new breakage into valid files because of
that.

> It could be helpful to make the GNU Cobol developers aware of this.

Agreed.

> >  . fix texi2any to produce tag tables that assume the CR characters
> >    are stripped from the Info file (my reading of the code is that it
> >    should not count CR characters before LF for the purposes of
> >    count_context value; or maybe it should simply open the Info output
> >    in 'unix' mode)
> 
> The tag table containing the exact byte offsets is a lot simpler than
> having to remove all of the CR characters (or just CR characters
> before LF), and therefore less prone to incorrect implementation by
> any other Info-reading or -writing programs that might be written.

See above: we are breaking the Emacs Info reader, which is the other
reader important to the GNU project.  And we are creating an
interoperability problem vis-a-vis older versions of Texinfo.  I think
this is too high a price to pay.

> It enables accessing the correct place in the file without
> processing the entire file first. This could enable faster access of
> nodes by memory-mapping a file. Most of the time speed isn't an
> issue, but it's an idea I've had for speeding up searching the
> indices of all installed Info files at once. It could also be used
> to access a node of an Info file over a slow or expensive network
> connection without having to download the entire file.

I'm okay with these goals, but I don't think they are worth the
breakage mentioned above.  Some of the goals can be met even without
removing CRs, e.g., by using a larger slack when using the offsets
from the tag tables.

> I hope it's possible to make changes to the standalon Info reader to
> make it possible to access files with CR-LF line endings without
> having to interpret the tag table this way. At the same time, if it's
> easy to avoid outputting files with CR-LF line endings under Windows,
> then I think we should do so.

Changing makeinfo to output a Unix-style file will solve some of the
problems, yes.  I hope Patrice is reading this, and will comment.  But
the interoperability problem with files created by older makeinfo
versions will stay.  Maybe we should add an optional switch to Info to
give the user control of this.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]