bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Skip filename recoding tests on MS-Windows


From: pertusus
Subject: Re: Skip filename recoding tests on MS-Windows
Date: Sun, 23 Oct 2022 20:22:29 +0200

On Sun, Oct 23, 2022 at 07:10:57PM +0300, Eli Zaretskii wrote:
> > 
> > Which is problematic, it means that with a correctly setup input file
> > with latin1 encoded character in the name, something wrong is going on.
> 
> The character is supposed to be encoded in Latin1, but I don't think
> it is, because Latin1 is not the locale's encoding here.

I think that it is encoded in Latin1, as discussed in another mail.

>  And some
> programs involved in these tests could decide they don't understand
> the character and replace it with a '?'.

There aren't other programs in these tests than texi2any.  The ?
appearing in the message may come from perl Encode which does not
know how to encode from the perl internal encoding to the C locale
sets up in the test as
LC_ALL=C; export LC_ALL

You could also try to call directly the command run for the test without
setting LC_ALL=C, like

perl -w ./..//texi2any.pl -c TEXI2HTML --force --conf-dir ./../t/init/ 
--conf-dir ./../init --conf-dir ./../ext -I ./formatting/ -I formatting// -I ./ 
-I .  --error-limit=1000 -c TEST=1  --output 
formatting//out_parser/manual_include_accented_file_name_latin1/ --info -D 
'needrecodedfilenames Need recoded file names' 
./formatting//manual_include_accented_file_name_latin1.texi

you should probably see something different, though probably not î,
maybe the character that corresponds to î in your locale, maybe
something else.


As to why the file is not found for
manual_include_accented_file_name_latin1, I think that the reason is the
following.  When the line from the Texinfo file is read, it is converted
from Latin1, based on the @documentencoding, to UTF-8.

In end_line.c, l 1416, the @include argument is processed which leads
to calling encode_file_name from input.c, to encode to the binary string
that should be used to find the file.

On Windows, we set DOC_ENCODING_FOR_INPUT_FILE_NAME to 0 (set in other
cases to 1).  Let's imagine that the locale is set to something like LOC.
In this case, INPUT_FILE_NAME_ENCODING is not set. Those informations
are passed to the XS parser, and used in input.c encode_file_name. Since
input_file_name_encoding is not set, nor
doc_encoding_for_input_file_name, the locale, LOC here, for example, is
used to recode the file name from UTF-8 to LOC.  My guess is that unless
the locale happens to be, by chance, a Latin1 locale, iconv will not
encode the accented character to the \xEE character as should be.


The failure of manual_include_accented_file_name_latin1_explicit_encoding
is more surprising to me, as in that case INPUT_FILE_NAME_ENCODING is
set to ISO-8859-1, so I do not understand why the test fails, the
reverse encoding from UTF-8 to ISO-8859-1 should lead to a path that can
be found.  The function where paths are looked for is
locate_include_file() in input.c, it could be where something unexpected
happens, maybe if stat() on Windows does some kind of conversion.


Debugging further the manual_include_accented_file_name_latin1_explicit_encoding
test would require showing the string bytes before and after the call to
encode_file_name() in end_line.c, and then, if the string bytes seem to
match the expected latin1 string with \xEE for î, check if something
unexpected happens in locate_include_file, maybe checking what are the
values of filename to check if there is indeed one that should lead to
stat giving a 0 return value.  

I do not know if it is practical for you to do that Eli?

The texi2any.pl call for that test should be something like:

perl -w ./..//texi2any.pl -c TEXI2HTML --force --conf-dir ./../t/init/ 
--conf-dir ./../init --conf-dir ./../ext -I ./formatting/ -I formatting// -I ./ 
-I .  --error-limit=1000 -c TEST=1  --output 
formatting//out_parser/manual_include_accented_file_name_latin1_explicit_encoding/
 --info -c INPUT_FILE_NAME_ENCODING=ISO-8859-1 -D 'needrecodedfilenames Need 
recoded file names' ./formatting//manual_include_accented_file_name_latin1.texi

-- 
Pat



reply via email to

[Prev in Thread] Current Thread [Next in Thread]