bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug #38795] texi2any makes CR in output when input is mixed CR-LF a


From: Vincent Belaïche
Subject: Re: [bug #38795] texi2any makes CR in output when input is mixed CR-LF and LF files
Date: Wed, 21 Aug 2013 23:04:22 +0200
User-agent: Thunderbird 2.0.0.24 (Windows/20100228)

Vincent Belaïche a écrit :
[...]
Hello Karl,

[...]
Now, there is something which I had not noticed in the first place: the info files are not the same in terms of amount of CR. It seems that the problem is quite funnier than I had initially thought :-/, I can't figure out what is happening ...

I have to go to work now, I will post more information later on (like what texi2any I am using, what activeperl version, and also info files resulting from launching the wrapper from other environments, or from doing by hand the command line that the wrapper is doing).

Anyway, please note that I am not in trouble with this problem, the files on all the projects which I am working about are consistently encoded, my point was rather to contribute to texinfo project by reporting the strange behaviour.

VBR,
  Vincent.

More experiment results.

Trying from an EMACS "bash" shell buffer, I get the following output:

-----
/c/Documents and Settings/Vincent/Local Settings/Temp/bug_38795>makeinfo bbdb.texi
Locales dir for document strings not found
makeinfo-dos.cpp cmdline=c:\msys\1.0\lib\activePerl\bin\perl.exe c:\Programme\GNU\installation\texinfo-install\trunk.old\tp\texi2any.pl "bbdb.texi"
/c/Documents and Settings/Vincent/Local Settings/Temp/bug_38795>
-----

And the produced info file is that one:

https://savannah.gnu.org/bugs/download.php?file_id=28901

Please note that the *bash* buffer is an MSYS bash that is launched in a commint mode buffer by the w32utils package `M-x bash' command, I also attached the w32utils package there --- this is a package which I wrote, I will make it public someday in a more open way, like importing it to some forge.

http://savannah.gnu.org/bugs/download.php?file_id=28902

When I do `M-x list-processes' I get:

-----
shell run *bash* -- C:/msys/1.0/bin/bash --posix --noediting -i shell<1> run *shell* -- C:/Programme/GNU/Emacs/bin/cmdproxy.exe -i
-----

where *bash* is the bash  buffer

Here I note that it seems that I get the same thing as with AUCTeX (mixed LF and CRLF)

Now, another experiment, I call the command directly under a *shell* buffer of EMACS, without using the wrapper, here is the sort of output I get:

----
c:\Documents and Settings\Vincent\Local Settings\Temp\bug_38795>c:\msys\1.0\lib\activePerl\bin\perl.exe c:\Programme\GNU\installation\texinfo-install\trunk.old\tp\texi2any.pl "bbdb.texi" c:\msys\1.0\lib\activePerl\bin\perl.exe c:\Programme\GNU\installation\texinfo-install\trunk.old\tp\texi2any.pl "bbdb.texi"
Locales dir for document strings not found

c:\Documents and Settings\Vincent\Local Settings\Temp\bug_38795>
----

and the info file is that one:

http://savannah.gnu.org/bugs/download.php?file_id=28903

Here it seems that the info is same as if I use the wrapper under *shell* buffer. I also did the same command line:

c:\msys\1.0\lib\activePerl\bin\perl.exe c:\Programme\GNU\installation\texinfo-install\trunk.old\tp\texi2any.pl "bbdb.texi"

from a cmd.exe console application launched directly from MSWindows --- i.e. not a *shell* buffer --- and got also the same info as http://savannah.gnu.org/bugs/download.php?file_id=28903

More intersting, I typed the following command line:

/c/msys/1.0/lib/activePerl/bin/perl.exe 'c:\Programme\GNU\installation\texinfo-install\trunk.old\tp\texi2any.pl' bbdb.texi

from an MSYS rxvt console application, and I got also the same info as http://savannah.gnu.org/bugs/download.php?file_id=28903 --- I would have expected the same thing as AUCTeX and *bash* with wrapper, but that did not happen.

Last trial was command w/o wrapper, ie

/c/msys/1.0/lib/activePerl/bin/perl.exe 'c:\Programme\GNU\installation\texinfo-install\trunk.old\tp\texi2any.pl' bbdb.texi

from the EMACS *bash* buffer, and I got also the same info as http://savannah.gnu.org/bugs/download.php?file_id=28903

So quite funny: I have two types of info output
- type #1: one with all lines terminated by a CRLF
- type #2: lines ending in CRLF only from line #1250, while lines at the beginning of the file are terminated in LF

After repeating the experiments I get the following:

- AUCTeX : sometimes type #1, and sometimes type #2 --- I could get type #2 only once, and could not reproduce it. - MSYS/rxvt w/o wrapper: sometimes type #1, and sometimes type #2 --- I could get type #2 less often than type #1, it seems that it can happens only the first time the console is launched
- MSYS/rxvt with wrapper type #1
- cmd.exe with of without wrapper: type #1
- w32utils' *bash* w/o wrapper sometimes type #1, and sometimes type #2
- w32utils' *bash* with wrapper type #1
- *shell* with or without wrapper: type #1

My first gut feeling was that the way the line ending are handled in perl depends on some detection of whether the environment is MSWindows or Linux, but this detection does not always give the same results dynamically depending on the environment and whether you look at input or at output. It seemed to me that when you use something like MSYS bash to launch the command, there is some effect in what perl detects. So it would be better to do explicitely this detection in the perl script once and for all, and then to apply it explicitely for all the output files, so that the output is consistent.

But now, I am really wondering whether there is any difference at all between type #1 and type #2, and the difference is only a display artefact of EMACS when I visit the info file. I seems that my EMACS version will hide no, or some of the ^M endings when I visit the file, and that behaviour is not systematic and completely confused me on what the real info output is !!

So I am now not completely sure that type #2 output really exists all, because it happens far less often and is hard to reproduce and now I could not reproduce it again to see the file with hexl-mode or checking the size to be sure that it really has mixed LF and CRLF and that is not a display artefact of visiting the file.

Maybe in the end it is just an EMACS display artefact and output info file always has type #1 (consistently CRLF endings). Maybe I was just puzzled by the fact that even with type #1 EMACS explicitely display the ^M endings, which usually happens only when the line endings are not consistent, but surely that also happens when the file is handled like some binary file --- which seems to be the case for info files.

I am now thinking that when I visited bbdb.info then when the display was like type #2 what has happened was that EMACS started with thinking that this is a text file, and then changed its opinion during the visit, which resulted in having two types of displaying (type #1 & type #2). At the end of the day, that could reveal some bug in EMACS EOL format detection.

I will try again to see whether I can really produce some type #2 file (or file display) again.

BR,
  Vincent.

PS: Sorry for the very lengthy email, I spent my evening trying to reproduce the issue. But now I cannot get type #2 display any longer...












-----



reply via email to

[Prev in Thread] Current Thread [Next in Thread]