[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Skip filename recoding tests on MS-Windows
From: |
pertusus |
Subject: |
Re: Skip filename recoding tests on MS-Windows |
Date: |
Sun, 23 Oct 2022 20:22:29 +0200 |
On Sun, Oct 23, 2022 at 07:10:57PM +0300, Eli Zaretskii wrote:
> >
> > Which is problematic, it means that with a correctly setup input file
> > with latin1 encoded character in the name, something wrong is going on.
>
> The character is supposed to be encoded in Latin1, but I don't think
> it is, because Latin1 is not the locale's encoding here.
I think that it is encoded in Latin1, as discussed in another mail.
> And some
> programs involved in these tests could decide they don't understand
> the character and replace it with a '?'.
There aren't other programs in these tests than texi2any. The ?
appearing in the message may come from perl Encode which does not
know how to encode from the perl internal encoding to the C locale
sets up in the test as
LC_ALL=C; export LC_ALL
You could also try to call directly the command run for the test without
setting LC_ALL=C, like
perl -w ./..//texi2any.pl -c TEXI2HTML --force --conf-dir ./../t/init/
--conf-dir ./../init --conf-dir ./../ext -I ./formatting/ -I formatting// -I ./
-I . --error-limit=1000 -c TEST=1 --output
formatting//out_parser/manual_include_accented_file_name_latin1/ --info -D
'needrecodedfilenames Need recoded file names'
./formatting//manual_include_accented_file_name_latin1.texi
you should probably see something different, though probably not î,
maybe the character that corresponds to î in your locale, maybe
something else.
As to why the file is not found for
manual_include_accented_file_name_latin1, I think that the reason is the
following. When the line from the Texinfo file is read, it is converted
from Latin1, based on the @documentencoding, to UTF-8.
In end_line.c, l 1416, the @include argument is processed which leads
to calling encode_file_name from input.c, to encode to the binary string
that should be used to find the file.
On Windows, we set DOC_ENCODING_FOR_INPUT_FILE_NAME to 0 (set in other
cases to 1). Let's imagine that the locale is set to something like LOC.
In this case, INPUT_FILE_NAME_ENCODING is not set. Those informations
are passed to the XS parser, and used in input.c encode_file_name. Since
input_file_name_encoding is not set, nor
doc_encoding_for_input_file_name, the locale, LOC here, for example, is
used to recode the file name from UTF-8 to LOC. My guess is that unless
the locale happens to be, by chance, a Latin1 locale, iconv will not
encode the accented character to the \xEE character as should be.
The failure of manual_include_accented_file_name_latin1_explicit_encoding
is more surprising to me, as in that case INPUT_FILE_NAME_ENCODING is
set to ISO-8859-1, so I do not understand why the test fails, the
reverse encoding from UTF-8 to ISO-8859-1 should lead to a path that can
be found. The function where paths are looked for is
locate_include_file() in input.c, it could be where something unexpected
happens, maybe if stat() on Windows does some kind of conversion.
Debugging further the manual_include_accented_file_name_latin1_explicit_encoding
test would require showing the string bytes before and after the call to
encode_file_name() in end_line.c, and then, if the string bytes seem to
match the expected latin1 string with \xEE for î, check if something
unexpected happens in locate_include_file, maybe checking what are the
values of filename to check if there is indeed one that should lead to
stat giving a 0 return value.
I do not know if it is practical for you to do that Eli?
The texi2any.pl call for that test should be something like:
perl -w ./..//texi2any.pl -c TEXI2HTML --force --conf-dir ./../t/init/
--conf-dir ./../init --conf-dir ./../ext -I ./formatting/ -I formatting// -I ./
-I . --error-limit=1000 -c TEST=1 --output
formatting//out_parser/manual_include_accented_file_name_latin1_explicit_encoding/
--info -c INPUT_FILE_NAME_ENCODING=ISO-8859-1 -D 'needrecodedfilenames Need
recoded file names' ./formatting//manual_include_accented_file_name_latin1.texi
--
Pat
- Re: Skip filename recoding tests on MS-Windows, (continued)
- Re: Skip filename recoding tests on MS-Windows, Eli Zaretskii, 2022/10/23
- Re: Skip filename recoding tests on MS-Windows, pertusus, 2022/10/23
- Re: Skip filename recoding tests on MS-Windows, Eli Zaretskii, 2022/10/23
- Re: Skip filename recoding tests on MS-Windows, Gavin Smith, 2022/10/23
- Re: Skip filename recoding tests on MS-Windows, pertusus, 2022/10/23
- Re: Skip filename recoding tests on MS-Windows, Eli Zaretskii, 2022/10/23
- Re: Skip filename recoding tests on MS-Windows, pertusus, 2022/10/23
- Re: Skip filename recoding tests on MS-Windows, Eli Zaretskii, 2022/10/23
- Re: Skip filename recoding tests on MS-Windows, pertusus, 2022/10/23
- Re: Skip filename recoding tests on MS-Windows, Eli Zaretskii, 2022/10/23
- Re: Skip filename recoding tests on MS-Windows,
pertusus <=
- Re: Skip filename recoding tests on MS-Windows, Eli Zaretskii, 2022/10/23
- Re: Skip filename recoding tests on MS-Windows, pertusus, 2022/10/23
- Re: Skip filename recoding tests on MS-Windows, Eli Zaretskii, 2022/10/24
- Re: Skip filename recoding tests on MS-Windows, pertusus, 2022/10/24
- Re: Skip filename recoding tests on MS-Windows, Eli Zaretskii, 2022/10/24
- Re: Skip filename recoding tests on MS-Windows, pertusus, 2022/10/24
- Re: Skip filename recoding tests on MS-Windows, Eli Zaretskii, 2022/10/24
- Re: Skip filename recoding tests on MS-Windows, pertusus, 2022/10/24
- Re: Skip filename recoding tests on MS-Windows, Eli Zaretskii, 2022/10/25
- Re: Skip filename recoding tests on MS-Windows, pertusus, 2022/10/25