[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug #46018] texi2any makes inconsistent EOL style
From: |
Vincent Belaïche |
Subject: |
[bug #46018] texi2any makes inconsistent EOL style |
Date: |
Sun, 27 Sep 2015 15:58:06 +0000 |
User-agent: |
Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko |
Follow-up Comment #2, bug #46018 (project texinfo):
It is true that MSYS programs open text files without doing the CRLF
conversion to LF, which is not surprising as they just do like programs on
systems which natively uses LF as an EOL.
Anyway all the non-MS programs internally (be it AWK, perl, or C runtimes
library) use the LF as a line termination for texts, so on system like MSDos
where CRLF is used instead of LF on the file system, there is a CRLF->LF
conversion at text input, and LF->CRLF at text output.
However it is still possible to write scripts in a way that they are
insensitive to the input EOL style.
For instance look at the following AWK experiment carried out with a MSYS gawk
on a DOS console:
$ @echo Life is a b****| gawk -- "/r$/ { print "Spurious trailing CR"};
/[^r]$/ { print "No spurious trailing CR"};"
-| Spurious trailing CR
The same experiment on an MSYS console:
$ echo 'Life is a b****' | gawk -- '/r$/ { print "Spurious trailing CR"};
/[^r]$/ { print "No spurious trailing CR"};'
No spurious trailing CR
Now, if I pass RS="r?n|r" to gawk in order to be insensitive to EOL style,
then I get for the DOS console:
$ echo Life is a b****| gawk -- "BEGIN { RS="r?n^|r"}; /r$/ { print "Spurious
trailing CR"}; /[^r]$/ { print "No spurious trailing CR"};"
-| No spurious trailing CR
And this does not change anything for the MSYS console:
$ echo 'Life is a b****' | gawk -- 'BEGIN { RS="r?n|r"}; /r$/ { print
"Spurious trailing CR"}; /[^r]$/ { print "No spurious trailing CR"};'
No spurious trailing CR
If I pass a CRLF EOL in the MSYS console I get the same result:
$ printf 'Life is a b****rn' | gawk -- 'BEGIN { RS="r?n|r"}; /r$/ { print
"Spurious trailing CR"}; /[^r]$/ { print "No spurious trailing CR"};'
No spurious trailing CR
So, in conclusion, by setting RS="r?n|r" in the BEGIN condition, or passed on
the command line, I can make my AWK script EOL insensitive.
The same type of case happens to MSYS Perl.
On a DOS console with an MSYS Perl without doing anything:
$ @echo Life is a b****| c:PathToMSYSperl.exe -e "while(<>){ if(m/r$/){ print
("Spurious trailing CR");} elsif(m/[^r]$/){ print ("No spurious trailing
CR");} }"
-| Spurious trailing CR
On a DOS console with a native Perl without doing anything:
$ @echo Life is a b****| c:PathToNativeperl.exe -e "while(<>){ if(m/r$/){
print ("Spurious trailing CR");} elsif(m/[^r]$/){ print ("No spurious trailing
CR");} }"
-| No spurious trailing CR
Now, if I set binmode(STDIN,':crlf') (cf perldoc/binmode
<http://perldoc.perl.org/functions/binmode.html>), then I get with MSYS Perl
$ @echo Life is a b****| c:PathToMSYSperl.exe -e "binmode(STDIN,':crlf');
while(<>){ if(m/r$/){ print ("Spurious trailing CR");} elsif(m/[^r]$/){ print
("No spurious trailing CR");} }"
-| No spurious trailing CR
But this does not harm the native perl:
$ @echo Life is a b****| c:PathToNativeperl.exe -e "binmode(STDIN,':crlf');
while(<>){ if(m/r$/){ print ("Spurious trailing CR");} elsif(m/[^r]$/){ print
("No spurious trailing CR");} }"
-| No spurious trailing CR
Conversely, with only an LF EOL, ie if I do the same experiment under an MSYS
console, then I also get, whether it be a native or an MSYS perl, the same
good result (no spurious trailing CR):
$ echo 'Life is a b****' | /Path/To/Native/perl.exe -e
'binmode(STDIN,'"'"':crlf'"'"'); while(<>){ if(m/r$/){ print ("Spurious
trailing CRn");} elsif(m/[^r]$/){ print ("No spurious trailing CRn");} }'
-| No spurious trailing CR
$ echo 'Life is a b****' | /Path/To/MSYS/perl.exe -e
'binmode(STDIN,'"'"':crlf'"'"'); while(<>){ if(m/r$/){ print ("Spurious
trailing CRn");} elsif(m/[^r]$/){ print ("No spurious trailing CRn");} }'
-| No spurious trailing CR
And the same will occur if I use CRLF under an MSYS console
$ printf 'Life is a b****rn' | /Path/To/Native/perl.exe -e
'binmode(STDIN,'"'"':crlf'"'"'); while(<>){ if(m/r$/){ print ("Spurious
trailing CRn");} elsif(m/[^r]$/){ print ("No spurious trailing CRn");} }'
-| No spurious trailing CR
$ printf 'Life is a b****rn' | /Path/To/MSYS/perl.exe -e
'binmode(STDIN,'"'"':crlf'"'"'); while(<>){ if(m/r$/){ print ("Spurious
trailing CRn");} elsif(m/[^r]$/){ print ("No spurious trailing CRn");} }'
-| No spurious trailing CR
It will be the same if the EOL is CR:
$ printf 'Life is a b****r' | /Path/To/Native/perl.exe -e
'binmode(STDIN,'"'"':crlf'"'"'); while(<>){ if(m/r$/){ print ("Spurious
trailing CRn");} elsif(m/[^r]$/){ print ("No spurious trailing CRn");} }'
-| No spurious trailing CR
$ printf 'Life is a b****r' | /Path/To/MSYS/perl.exe -e
'binmode(STDIN,'"'"':crlf'"'"'); while(<>){ if(m/r$/){ print ("Spurious
trailing CRn");} elsif(m/[^r]$/){ print ("No spurious trailing CRn");} }'
-| No spurious trailing CR
So, in conclusion, be it perl or any other script (AWK...), the script can be
written in a way that it is EOL style insensitive. This way of writing the
script is useful not only on MSW, but also if you want to process a source
file with non Linux EOL style (MSDOS CRLF, or old MacOS pre-Darwin CR) with a
Linux texi2any.
Then, I don't know whether this is a bug or not. I fully agree with Eli's
comment, on MSW:
* make sure that all input files are with LF endings (do ``svn propset
svn:eol-style LF'' on the input files).
* use MSYS perl, so that we get LF also at output (and do ``svn propset
svn:eol-style LF'' also on the output files if they are archived).
This is my prefered way and what I did for latexrefman, however, it is would
be also possible to have the native svn:eol-style, and use native perl. That
would also work w/o conflicts. So maybe this is a bug...
_______________________________________________________
Reply to this item at:
<http://savannah.gnu.org/bugs/?46018>
_______________________________________________
Message posté via/par Savannah
http://savannah.gnu.org/