bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Non-ASCII characters in @include search path


From: Patrice Dumas
Subject: Re: Non-ASCII characters in @include search path
Date: Sun, 20 Feb 2022 13:02:23 +0100

On Sun, Feb 20, 2022 at 10:11:09AM +0000, Gavin Smith wrote:
> On Sun, Feb 20, 2022 at 09:11:54AM +0000, Gavin Smith wrote:
> > On Sat, Feb 19, 2022 at 11:00:33PM +0100, Patrice Dumas wrote:
> > > I think that there is some wrong encoding/decoding somewhere,
> > > but I don't know where.  It is particularly strange that I cannot
> > > reproduce with 6.8 but Gaël can.
> > 
> > I reproduced with 6.8 but only with TEXINFO_XS=omit.  I am going to
> > investigate.
> 
> I reproduced with the development version.

Me too, with TEXINFO_XS=omit. Perl v5.34.0.

> I found that the -f and -r
> operators in Perl would not find a file named with an identical string
> (showing equal with the eq operator) but encoded internally with UTF-8,
> so that utf8::is_utf8 returns true.  The File::Spec functions return
> such a string.  The following fixed it for me:
> 
> diff --git a/tp/Texinfo/Common.pm b/tp/Texinfo/Common.pm
> index 29dbf3c8c3..8219534984 100644
> --- a/tp/Texinfo/Common.pm
> +++ b/tp/Texinfo/Common.pm
> @@ -1548,6 +1548,9 @@ sub locate_include_file($$)
>          File::Spec->catdir(File::Spec->splitdir($include_directories),
>                             @directories), $filename);
>        #$file = "$include_dir/$text" if (-e "$include_dir/$text" and -r 
> "$include_dir/$text");
> +
> +      utf8::downgrade ($possible_file);
> +
>        $file = "$possible_file" if (-e "$possible_file" and -r 
> "$possible_file");
>        last if (defined($file));
>      }
> 
> 
> This is obviously a mess.  We should decide exactly where the bug is: in
> the -e operator itself, in File::Spec, or in the way that we use it.
> 
> It might be simpler to eschew File::Spec and just get the filenames with
> simple string operators.

I had a look at File::Spec::Unix, it also use simple string operators.
I tested with simple string operators and the issue is still present.

I am pretty sure that the issue is something else and has to do with the
code which does not encode correctly the string passed to -e and
probably to the stat() perl function.  It is not clear to me if this
should be bytes or internally encoded unicode perl strings.  Your test
and a sentence in https://perldoc.perl.org/perlunicode seems to point
towards bytes:

 String handling functions, for the most part, continue to operate in
 terms of characters....

 The exceptions are:
   * some operators that interact with the platform's operating system
     Operators dealing with filenames are examples.

I just read some code in the XS parser, and it seems that the XS parser
works with utf8 encoded byte strings.  Therefore the stat within the
XS parser will work if the file names are actually encoded using utf8 in
the file system.  This should be right for Linux/Unix, but for Windows,
I think that it is not ok.

So I think that the best would be to encode to the encoding expected for
file names, could be utf8 in the default case, but maybe something
platform specific, if needed.

-- 
Pat



reply via email to

[Prev in Thread] Current Thread [Next in Thread]