bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Skip filename recoding tests on MS-Windows


From: pertusus
Subject: Re: Skip filename recoding tests on MS-Windows
Date: Tue, 25 Oct 2022 21:18:35 +0200

On Tue, Oct 25, 2022 at 04:26:52PM +0300, Eli Zaretskii wrote:
> > Date: Tue, 25 Oct 2022 14:54:04 +0200
> > From: pertusus@free.fr
> > Cc: GavinSmith0123@gmail.com, bug-texinfo@gnu.org
> > 
> > > It reports nothing (i.e., no diffs).
> > 
> > Then the test passed.  My wild guess is that, even though the \xFF is
> > interpreted in the wrong locale and the character is not î, as long as
> > it is wrongly interpreted consistently, it does not matter.  Windows
> > assumes that it is in your current codepage, although it is untrue,
> > while the Perl code assumes that it is considered as Latin1 by Windows,
> > it is also untrue, but as long as it is written and read as \xFF, it's
> > ok.
> 
> Yes, I think this is what happens.  As long as the byte has a
> character in the system codepage, Windows will play along.

Ok.  That means that we probably should keep on skipping the test, even
if it succeeds for you, if the current codepage were a multibyte codepage
in Windows.  I will update the comments.

> > The formatting manual_include_accented_file_name_latin1 fails, but I
> > think I know why, I explained it in another mail, it is because
> > DOC_ENCODING_FOR_INPUT_FILE_NAME is set to 0 for Windows, which may not
> > be the best afterall, according to the test, because it is better to
> > 'fool' Windows rather than get a file name with the correct characters.
> > You proposed to have DOC_ENCODING_FOR_INPUT_FILE_NAME set to 0 for
> > windows such that the codepage is used for the file names encoding, do
> > you still think that it is a good idea in general?  (For the test, we
> > can do something else, or SKIP if in Windows).
> 
> It should work if the document is indeed in the expected encoding.
> But if the file is actually encoded in something other, especially if
> the encoding is multibyte (like UTF-8), it will not work.

Indeed, it is not reliable, but what would be the best default?  It
seems to me that Windows adds additional possibilities for anything to
fail.  However, on the issue of using the codepage to encode file names
in texi2any versus using the input file encoding, it does not seems to
me that Windows is special.  If we use the input file encoding on other
platforms, assuming that the use case is converting manuals from
archives where all the files are similarily encoded, consistently with
manuals, it seems to me that Windows is not very special.  It will
fail in some cases on Windows, but using the user codepage will decrease
even more the possibility that the result is correct (files with encoded
characters in their names are found).  Are you still sure that using
the user current codepage is the best in this situation?

> > It would be nice if you could check if formatting
> > manual_include_accented_file_name_latin1_use_locale_encoding succeds or
> > fail.  Could you please run
> > ./test_scripts/formatting_manual_include_accented_file_name_latin1_use_locale_encoding.sh
> > which is somewhat verbose and check $?.  And maybe send the whole
> > output.
> 
> It says:
> 
>   $ 
> ./test_scripts/formatting_manual_include_accented_file_name_latin1_use_locale_encoding.sh
>   testdir: formatting/
>   driving_file: ./formatting//list-of-tests
>   made result dir: ./formatting//res_parser/
> 
>   doing test manual_include_accented_file_name_latin1_use_locale_encoding, 
> src_file ./formatting//manual_include_accented_file_name_latin1.texi
>   format_option: -c TEXI2HTML
>   texi2any.pl manual_include_accented_file_name_latin1_use_locale_encoding -> 
> formatting//out_parser/manual_include_accented_file_name_latin1_use_locale_encoding
>   /d/usr/Perl/bin/perl -w ./..//texi2any.pl -c TEXI2HTML --force --conf-dir 
> ./../t/init/ --conf-dir ./../init --conf-dir ./../ext -I ./formatting/ -I 
> formatting// -I ./ -I .  --error-limit=1000 -c TEST=1  --output 
> formatting//out_parser/manual_include_accented_file_name_latin1_use_locale_encoding/
>  --info -D 'needrecodedfilenames Need recoded file names' -c 
> MESSAGE_ENCODING=UTF-8 -c INPUT_FILE_NAME_ENCODING=UTF-8 
> ./formatting//manual_include_accented_file_name_latin1.texi > 
> formatting//out_parser/manual_include_accented_file_name_latin1_use_locale_encoding/manual_include_accented_file_name_latin1.1
>  
> 2>formatting//out_parser/manual_include_accented_file_name_latin1_use_locale_encoding/manual_include_accented_file_name_latin1.2
> 
>   all done, exiting with status 0
> 
> So it sounds like it succeeds?

Yes.  It is a kind of a 'negative' test, as it succeeds if the file is
not found.  It could be because of the intended failure, setting
INPUT_FILE_NAME_ENCODING=UTF-8 while the input file name is in Latin1.
But it could also be possible, in theory, to have another result with
another codepage, in particular a failure for a different reason.
I can't imagine a situation where the file name would end up being
encoded in UTF-8 in Windows, even with a codepage in UTF-8, as the \xFF
output by perl for the included file name could lead to something else,
but not to something that is the UTF-8 encoded î character.  Even though
we do not need to skip the test on Windows, I think that it is better
to skip it, as in case of multibyte coodepage, the file created,
supposed to be encoded in Latin1 will not be as expected, and the
test will not succeed, but not for the expected reasons and does not
test what it is supposed to test.

> Note: this is basically the 6.8.90 pretest, just slightly patched
> according to a few patches: your patch in
> copy_change_file_name_encoding and Gavin's patch in several files to
> avoid crashes due to inconsistent memory-allocation functions being
> used.  All the other changes post-6.8.90 that are in Git are not in
> the code I'm running.

-- 
Pat



reply via email to

[Prev in Thread] Current Thread [Next in Thread]