bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: different encodings for input and output file names and command line


From: Gavin Smith
Subject: Re: different encodings for input and output file names and command line
Date: Fri, 4 Mar 2022 18:36:09 +0000

On Fri, Mar 04, 2022 at 08:15:54AM +0100, Patrice Dumas wrote:
> > +  if ($self->get_conf('DOC_ENCODING_FOR_INPUT_FILE_NAME')) {
> > +    my $document_encoding;
> > +    $document_encoding = $self->{'parser_info'}->{'input_perl_encoding'}
> > +      if ($self->{'parser_info'}
> > +        and defined($self->{'parser_info'}->{'input_perl_encoding'}));
> > +    return Texinfo::Common::encode_file_name($self, $file_name,
> > +                                             $document_encoding);
> > +  } else {
> > +    return Texinfo::Common::encode_file_name($self, $file_name,
> > +                       $self->get_conf('LOCALE_INPUT_FILE_NAME_ENCODING'));
> > +  }
> > +}
> 
> The code looks right.

I've implemented this in the XS parser although haven't been able to
get DOC_ENCODING_FOR_INPUT_FILE_NAME=0 to work.

(note this email is UTF-8 encoded but this is copied and pasted from a
Latin-1 terminal)

$ locale
LANG=en_GB.UTF-8
LANGUAGE=en_GB:en
LC_CTYPE="fr_FR"
LC_NUMERIC="fr_FR"
LC_TIME="fr_FR"
LC_COLLATE="fr_FR"
LC_MONETARY="fr_FR"
LC_MESSAGES="fr_FR"
LC_PAPER="fr_FR"
LC_NAME="fr_FR"
LC_ADDRESS="fr_FR"
LC_TELEPHONE="fr_FR"
LC_MEASUREMENT="fr_FR"
LC_IDENTIFICATION="fr_FR"
LC_ALL=fr_FR
$ cat é.texi
\input texinfo

@documentencoding ISO-8859-1

@setfilename ü.info

@include aß.texi

@bye
$ cat aß.texi
faerrra
$ ../texi2any.pl é.texi
é.texi: warning: document without nodes
$ cat ü.info
This is ü.info, produced by texi2any version 6.8dev+dev from é.texi.

faerrra



Tag Table:

End Tag Table


Local Variables:
coding: iso-8859-1
End:
$  # so far so good
$ ../texi2any.pl é.texi -c DOC_ENCODING_FOR_INPUT_FILE_NAME=0 -c 
LOCALE_INPUT_FILE_NAME_ENCODING=ISO-8859-1
é.texi:7: @include: could not find aß.texi
$ ../texi2any.pl é.texi -c DOC_ENCODING_FOR_INPUT_FILE_NAME=0                  
é.texi:7: @include: could not find aß.texi
$ TEXINFO_XS=omit ../texi2any.pl é.texi -c DOC_ENCODING_FOR_INPUT_FILE_NAME=0
é.texi:7: @include: could not find aß.texi
$ TEXINFO_XS=omit ../texi2any.pl é.texi -c DOC_ENCODING_FOR_INPUT_FILE_NAME=0 
-c LOCALE_INPUT_FILE_NAME_ENCODING=ISO-8859-1
é.texi:7: @include: could not find aß.texi


--------------

On making the following change, it appears that LOCALE_INPUT_FILE_NAME_ENCODING
is undefined:

diff --git a/tp/Texinfo/ParserNonXS.pm b/tp/Texinfo/ParserNonXS.pm
index 3920bd076e..8a23690cf7 100644
--- a/tp/Texinfo/ParserNonXS.pm
+++ b/tp/Texinfo/ParserNonXS.pm
@@ -2024,6 +2024,8 @@ sub _encode_file_name($$)
     return Texinfo::Common::encode_file_name($self, $file_name,
                  $self->{'info'}->{'input_perl_encoding'});
   } else {
+         warn "<" .    $self->get_conf('LOCALE_INPUT_FILE_NAME_ENCODING')
+         .">\n";
     return Texinfo::Common::encode_file_name($self, $file_name,
              $self->get_conf('LOCALE_INPUT_FILE_NAME_ENCODING'));
   }


This leads to extra output

Use of uninitialized value in concatenation (.) or string at 
../../tp/Texinfo/ParserNonXS.pm line 2027.
<>

with TEXINFO_XS=omit.

I've attached these two files in a tar file to this email.

I haven't investigated yet why LOCALE_INPUT_FILE_NAME_ENCODING is undefined.

Attachment: test.tar
Description: Unix tar archive


reply via email to

[Prev in Thread] Current Thread [Next in Thread]