[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: CR-LF line endings in Texinfo files (was texinfo-6.3.90 pretest)
From: |
Eli Zaretskii |
Subject: |
Re: CR-LF line endings in Texinfo files (was texinfo-6.3.90 pretest) |
Date: |
Thu, 18 May 2017 18:40:09 +0300 |
> From: Gavin Smith <address@hidden>
> Date: Thu, 18 May 2017 08:04:48 +0100
> Cc: address@hidden
>
> > Info can indeed read them, but cross-references to anchors don't work
> > correctly then. They land the reader at the wrong place, because byte
> > counts don't match. This started happening some versions ago, because
> > of some change whose details I no longer remember.
>
> So the Info files produced by old versions of makeinfo
> on MS-DOS or MS-Windows ended lines with a CR-LF sequence, but they
> were only counted as one in the tags table (giving byte offsets within
> the file of nodes). So info stripped these sequences so that files
> could be read correctly.
This is consistent with what I remember.
> Eventually it appeared that there could be files with some lines
> ending in CR-LF which were counted as _two_ bytes, so in an attempt
> to support these, the CR bytes were only stripped after a node
> couldn't be found in a file. Apparently this doesn't work completely
> reliably, especially for anchors where is nothing at the offset
> to confirm that you found the right place (unlike nodes, which
> have a node separator).
I'm not sure we should cater to such strange files.
> Could we unconditionally strip the CR's on DOS and Windows only?
In the Info reader or in texi2any?
If the former, then we could indeed do that, if we don't care about
CR-LF files that somehow end up living on Posix systems.
> This
> could be done by calling the 'convert_eols' function (currently in
> info/nodes.c), or else by opening the file in "text" mode in
> filesys_read_info_file in info/filesys.c (currently it uses a flag
> "O_BINARY" to open the file).
I think the latter would be a mistake, because Info files include some
non-text bytes that could confuse text-mode reads.
> This would mean that Info files with CR-LF line endings could only
> be read on Windows, and moreover Info on Windows could not read
> (the few) Info files where the CR bytes were counted. I believe
> this could result in quite a bit of simplification of the code, as the
> code that conditionally calls 'convert_eols' is a bit difficult
> (see find_node_from_tag in info/nodes.c), so I'd like to make
> this change.
Fine with me. (I think this is how things worked originally, and the
idea to strip CR on Unix was from Karl, but maybe I'm mistaken.)