bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Non-ASCII characters in @include search path


From: Gavin Smith
Subject: Re: Non-ASCII characters in @include search path
Date: Sat, 26 Feb 2022 20:06:52 +0000

On Sat, Feb 26, 2022 at 06:50:10PM +0100, Patrice Dumas wrote:
> You don't need a non-UTF-8 locale for the issue above, or for the issue
> that prompted me to try to look seriously at the issue, which is
> tests/formatting/list-of-tests non_ascii_test_epub. Having an accented
> letter in the document name makes it very hard to determine what should
> be encoded/decoded in init/epub3.pm and upstream code, in particular in
> Texinfo/Convert/Converter.pm determine_files_and_directory(), but
> although I thought previously that it could be solved in that function
> only, it is not so simple, strings come from everywhere in
> init/epub3.pm.

I'm typing this as I try to fix it.

I generated this test and looked at the output.  The initial output was
this (in the file osé.2, output of stderr):

osé.texi:15: warning: undefined flag: vùr
osé.texi:23: @include: could not find not_existïng.téxi
texi2any: could not copy `formatting/an_ïmage.png' to 
`formatting/out_parser/non_ascii_test_epub/osé_epub_package/EPUB/images/an_ïmage.png':
 No such file or directory
osé.texi:21: warning: @image file `dîrectory/imàge' (for HTML) not found, using 
`dîrectory/imàge.êxt'
texi2any: @image file `dîrectory/imàge' can not be copied
osé.texi:27: @verbatiminclude: could not find vi_not_existïng.téxi

There's a clear issue there with the image copying.  I added print
statements to epub3.pm to check what was in the strings.

In the error message, the encoded string is used (UTF-8 flag off), which
leads to the double UTF-8 in 'formatting/an_ïmage.png.'

This actually seems impossible to completely fix with the current approach:
since error messages are character strings (a recent change, but required for
correct interpolation of non-filename strings), if there is some file operation
error with a non-UTF-8 filename, it will be impossible to interpolate that
filename into the error message.  (This is possibly not a major issue as this
error is not output when other fixes are made - see later in this email.)

I "fixed" this by calling utf8::decode on the interpolated filename;
of course, this will be wrong if the filename is not in UTF-8, but there
is no alternative.

Debugging code:

diff --git a/tp/init/epub3.pm b/tp/init/epub3.pm
index 359d4ffca7..cb1547510a 100644
--- a/tp/init/epub3.pm
+++ b/tp/init/epub3.pm
@@ -174,8 +174,21 @@ sub epub_convert_image_command($$$$)
       }
       my $image_destination_path = File::Spec->catfile($images_destination_dir,
                                                        $image_file);
+      warn "CHECK1 <$images_destination_dir>:"
+      .utf8::is_utf8($images_destination_dir) . "\n";
+
+      warn "CHECK4 <$image_file>:"
+      .utf8::is_utf8($image_file) . "\n";
+
+      warn "CHECK3 <$image_destination_path>:"
+      .utf8::is_utf8($image_destination_path) . "\n";
+
       my $copy_succeeded = copy($image_path, $image_destination_path);
       if (not $copy_succeeded) {
+        warn "CHECK2 <$image_path>:"
+        .utf8::is_utf8($image_path) . "\n";
+
+        utf8::decode($image_path);
         $self->document_error($self, sprintf(__(
                             "could not copy `%s' to `%s': %s
                             $image_path, $image_destination_path, $!));

Output:

$ cat formatting/out_parser/non_ascii_test_epub/osé.2
osé.texi:15: warning: undefined flag: vùr
osé.texi:23: @include: could not find not_existïng.téxi
CHECK1 <formatting/out_parser/non_ascii_test_epub/osé_epub_package/EPUB/images>:
CHECK4 <an_�mage.png>:1
CHECK3 
<formatting/out_parser/non_ascii_test_epub/osé_epub_package/EPUB/images/an_�mage.png>:1
CHECK2 <formatting/an_ïmage.png>:
texi2any: could not copy `formatting/an_ïmage.png' to 
`formatting/out_parser/non_ascii_test_epub/osé_epub_package/EPUB/images/an_ïmage.png':
 No such file or directory
osé.texi:21: warning: @image file `dîrectory/imàge' (for HTML) not found, using 
`dîrectory/imàge.êxt'
texi2any: @image file `dîrectory/imàge' can not be copied
osé.texi:27: @verbatiminclude: could not find vi_not_existïng.téxi
"),

As you see the UTF-8 flag is on for $image_file, but not for
$images_destination_dir.

$image_file came from Texinfo::HTML::html_image_file_location_name.

Debugging code in HTML.pm:

diff --git a/tp/Texinfo/Convert/HTML.pm b/tp/Texinfo/Convert/HTML.pm
index 374b41c4d8..4ce595f61c 100644
--- a/tp/Texinfo/Convert/HTML.pm
+++ b/tp/Texinfo/Convert/HTML.pm
@@ -282,6 +282,9 @@ sub html_image_file_location_name($$$$)
         # will be moved by the caller anyway.
         # If the file path found was to be used it should be decoded to perl
         # codepoints too.
+        warn "IMAGE ".
+        utf8::is_utf8($image_basefile)
+        .":".utf8::is_utf8($extension)."\n";
         $image_file = $image_basefile.$extension;
         $image_extension = $extension;
         last;

output:

IMAGE 1:


$image_basefile has the UTF-8 flag on (and $extension doesn't). However,
encoded_file_name was already called, so the output from it could be used
instead:

diff --git a/tp/Texinfo/Convert/HTML.pm b/tp/Texinfo/Convert/HTML.pm
index 374b41c4d8..2ef6df8c54 100644
--- a/tp/Texinfo/Convert/HTML.pm
+++ b/tp/Texinfo/Convert/HTML.pm
@@ -282,7 +282,7 @@ sub html_image_file_location_name($$$$)
         # will be moved by the caller anyway.
         # If the file path found was to be used it should be decoded to perl
         # codepoints too.
-        $image_file = $image_basefile.$extension;
+        $image_file = $file_name;
         $image_extension = $extension;
         last;
       }

This eliminates the error message about 'could not copy':

$ cat formatting/out_parser/non_ascii_test_epub/osé.2
osé.texi:15: warning: undefined flag: vùr
osé.texi:23: @include: could not find not_existïng.téxi
osé.texi:21: warning: @image file `dîrectory/imàge' (for HTML) not found, using 
`dîrectory/imàge.êxt'
texi2any: @image file `dîrectory/imàge' can not be copied
osé.texi:27: @verbatiminclude: could not find vi_not_existïng.téxi


Does that fix the issue with this test?

Here are the rest of the files in the output directory:

$ find formatting/out_parser/non_ascii_test_epub/
formatting/out_parser/non_ascii_test_epub/
formatting/out_parser/non_ascii_test_epub/osé.1
formatting/out_parser/non_ascii_test_epub/osé.2
formatting/out_parser/non_ascii_test_epub/osé_epub_package
formatting/out_parser/non_ascii_test_epub/osé_epub_package/mimetype
formatting/out_parser/non_ascii_test_epub/osé_epub_package/EPUB
formatting/out_parser/non_ascii_test_epub/osé_epub_package/EPUB/osé.opf
formatting/out_parser/non_ascii_test_epub/osé_epub_package/EPUB/xhtml
formatting/out_parser/non_ascii_test_epub/osé_epub_package/EPUB/xhtml/nav_toc.xhtml
formatting/out_parser/non_ascii_test_epub/osé_epub_package/EPUB/xhtml/osé.xhtml
formatting/out_parser/non_ascii_test_epub/osé_epub_package/EPUB/images
formatting/out_parser/non_ascii_test_epub/osé_epub_package/EPUB/images/an_ïmage.png
formatting/out_parser/non_ascii_test_epub/osé_epub_package/META-INF
formatting/out_parser/non_ascii_test_epub/osé_epub_package/META-INF/container.xml

Is this correct?



reply via email to

[Prev in Thread] Current Thread [Next in Thread]