[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Non-ASCII characters in @include search path
From: |
Gavin Smith |
Subject: |
Re: Non-ASCII characters in @include search path |
Date: |
Sat, 26 Feb 2022 20:06:52 +0000 |
On Sat, Feb 26, 2022 at 06:50:10PM +0100, Patrice Dumas wrote:
> You don't need a non-UTF-8 locale for the issue above, or for the issue
> that prompted me to try to look seriously at the issue, which is
> tests/formatting/list-of-tests non_ascii_test_epub. Having an accented
> letter in the document name makes it very hard to determine what should
> be encoded/decoded in init/epub3.pm and upstream code, in particular in
> Texinfo/Convert/Converter.pm determine_files_and_directory(), but
> although I thought previously that it could be solved in that function
> only, it is not so simple, strings come from everywhere in
> init/epub3.pm.
I'm typing this as I try to fix it.
I generated this test and looked at the output. The initial output was
this (in the file osé.2, output of stderr):
osé.texi:15: warning: undefined flag: vùr
osé.texi:23: @include: could not find not_existïng.téxi
texi2any: could not copy `formatting/an_ïmage.png' to
`formatting/out_parser/non_ascii_test_epub/osé_epub_package/EPUB/images/an_ïmage.png':
No such file or directory
osé.texi:21: warning: @image file `dîrectory/imàge' (for HTML) not found, using
`dîrectory/imàge.êxt'
texi2any: @image file `dîrectory/imàge' can not be copied
osé.texi:27: @verbatiminclude: could not find vi_not_existïng.téxi
There's a clear issue there with the image copying. I added print
statements to epub3.pm to check what was in the strings.
In the error message, the encoded string is used (UTF-8 flag off), which
leads to the double UTF-8 in 'formatting/an_ïmage.png.'
This actually seems impossible to completely fix with the current approach:
since error messages are character strings (a recent change, but required for
correct interpolation of non-filename strings), if there is some file operation
error with a non-UTF-8 filename, it will be impossible to interpolate that
filename into the error message. (This is possibly not a major issue as this
error is not output when other fixes are made - see later in this email.)
I "fixed" this by calling utf8::decode on the interpolated filename;
of course, this will be wrong if the filename is not in UTF-8, but there
is no alternative.
Debugging code:
diff --git a/tp/init/epub3.pm b/tp/init/epub3.pm
index 359d4ffca7..cb1547510a 100644
--- a/tp/init/epub3.pm
+++ b/tp/init/epub3.pm
@@ -174,8 +174,21 @@ sub epub_convert_image_command($$$$)
}
my $image_destination_path = File::Spec->catfile($images_destination_dir,
$image_file);
+ warn "CHECK1 <$images_destination_dir>:"
+ .utf8::is_utf8($images_destination_dir) . "\n";
+
+ warn "CHECK4 <$image_file>:"
+ .utf8::is_utf8($image_file) . "\n";
+
+ warn "CHECK3 <$image_destination_path>:"
+ .utf8::is_utf8($image_destination_path) . "\n";
+
my $copy_succeeded = copy($image_path, $image_destination_path);
if (not $copy_succeeded) {
+ warn "CHECK2 <$image_path>:"
+ .utf8::is_utf8($image_path) . "\n";
+
+ utf8::decode($image_path);
$self->document_error($self, sprintf(__(
"could not copy `%s' to `%s': %s
$image_path, $image_destination_path, $!));
Output:
$ cat formatting/out_parser/non_ascii_test_epub/osé.2
osé.texi:15: warning: undefined flag: vùr
osé.texi:23: @include: could not find not_existïng.téxi
CHECK1 <formatting/out_parser/non_ascii_test_epub/osé_epub_package/EPUB/images>:
CHECK4 <an_�mage.png>:1
CHECK3
<formatting/out_parser/non_ascii_test_epub/osé_epub_package/EPUB/images/an_�mage.png>:1
CHECK2 <formatting/an_ïmage.png>:
texi2any: could not copy `formatting/an_ïmage.png' to
`formatting/out_parser/non_ascii_test_epub/osé_epub_package/EPUB/images/an_ïmage.png':
No such file or directory
osé.texi:21: warning: @image file `dîrectory/imàge' (for HTML) not found, using
`dîrectory/imàge.êxt'
texi2any: @image file `dîrectory/imàge' can not be copied
osé.texi:27: @verbatiminclude: could not find vi_not_existïng.téxi
"),
As you see the UTF-8 flag is on for $image_file, but not for
$images_destination_dir.
$image_file came from Texinfo::HTML::html_image_file_location_name.
Debugging code in HTML.pm:
diff --git a/tp/Texinfo/Convert/HTML.pm b/tp/Texinfo/Convert/HTML.pm
index 374b41c4d8..4ce595f61c 100644
--- a/tp/Texinfo/Convert/HTML.pm
+++ b/tp/Texinfo/Convert/HTML.pm
@@ -282,6 +282,9 @@ sub html_image_file_location_name($$$$)
# will be moved by the caller anyway.
# If the file path found was to be used it should be decoded to perl
# codepoints too.
+ warn "IMAGE ".
+ utf8::is_utf8($image_basefile)
+ .":".utf8::is_utf8($extension)."\n";
$image_file = $image_basefile.$extension;
$image_extension = $extension;
last;
output:
IMAGE 1:
$image_basefile has the UTF-8 flag on (and $extension doesn't). However,
encoded_file_name was already called, so the output from it could be used
instead:
diff --git a/tp/Texinfo/Convert/HTML.pm b/tp/Texinfo/Convert/HTML.pm
index 374b41c4d8..2ef6df8c54 100644
--- a/tp/Texinfo/Convert/HTML.pm
+++ b/tp/Texinfo/Convert/HTML.pm
@@ -282,7 +282,7 @@ sub html_image_file_location_name($$$$)
# will be moved by the caller anyway.
# If the file path found was to be used it should be decoded to perl
# codepoints too.
- $image_file = $image_basefile.$extension;
+ $image_file = $file_name;
$image_extension = $extension;
last;
}
This eliminates the error message about 'could not copy':
$ cat formatting/out_parser/non_ascii_test_epub/osé.2
osé.texi:15: warning: undefined flag: vùr
osé.texi:23: @include: could not find not_existïng.téxi
osé.texi:21: warning: @image file `dîrectory/imàge' (for HTML) not found, using
`dîrectory/imàge.êxt'
texi2any: @image file `dîrectory/imàge' can not be copied
osé.texi:27: @verbatiminclude: could not find vi_not_existïng.téxi
Does that fix the issue with this test?
Here are the rest of the files in the output directory:
$ find formatting/out_parser/non_ascii_test_epub/
formatting/out_parser/non_ascii_test_epub/
formatting/out_parser/non_ascii_test_epub/osé.1
formatting/out_parser/non_ascii_test_epub/osé.2
formatting/out_parser/non_ascii_test_epub/osé_epub_package
formatting/out_parser/non_ascii_test_epub/osé_epub_package/mimetype
formatting/out_parser/non_ascii_test_epub/osé_epub_package/EPUB
formatting/out_parser/non_ascii_test_epub/osé_epub_package/EPUB/osé.opf
formatting/out_parser/non_ascii_test_epub/osé_epub_package/EPUB/xhtml
formatting/out_parser/non_ascii_test_epub/osé_epub_package/EPUB/xhtml/nav_toc.xhtml
formatting/out_parser/non_ascii_test_epub/osé_epub_package/EPUB/xhtml/osé.xhtml
formatting/out_parser/non_ascii_test_epub/osé_epub_package/EPUB/images
formatting/out_parser/non_ascii_test_epub/osé_epub_package/EPUB/images/an_ïmage.png
formatting/out_parser/non_ascii_test_epub/osé_epub_package/META-INF
formatting/out_parser/non_ascii_test_epub/osé_epub_package/META-INF/container.xml
Is this correct?
- Re: Non-ASCII characters in @include search path, (continued)
- Re: Non-ASCII characters in @include search path, Patrice Dumas, 2022/02/21
- Re: Non-ASCII characters in @include search path, Patrice Dumas, 2022/02/25
- Re: Non-ASCII characters in @include search path, Gavin Smith, 2022/02/26
- Re: Non-ASCII characters in @include search path, Patrice Dumas, 2022/02/26
- Re: Non-ASCII characters in @include search path, Gavin Smith, 2022/02/26
- Re: Non-ASCII characters in @include search path, Gavin Smith, 2022/02/26
- Re: Non-ASCII characters in @include search path, Patrice Dumas, 2022/02/26
- Re: Non-ASCII characters in @include search path,
Gavin Smith <=
- Re: Non-ASCII characters in @include search path, Gavin Smith, 2022/02/26
- Re: Non-ASCII characters in @include search path, Patrice Dumas, 2022/02/26
- Re: Non-ASCII characters in @include search path, Gavin Smith, 2022/02/26
- Re: Non-ASCII characters in @include search path, Patrice Dumas, 2022/02/26
- Re: Non-ASCII characters in @include search path, Patrice Dumas, 2022/02/26
- Re: Non-ASCII characters in @include search path, Gavin Smith, 2022/02/26
- Re: Non-ASCII characters in @include search path, Patrice Dumas, 2022/02/26
- Re: Non-ASCII characters in @include search path, Eli Zaretskii, 2022/02/27
- Re: Non-ASCII characters in @include search path, pertusus, 2022/02/27
- Re: Non-ASCII characters in @include search path, Patrice Dumas, 2022/02/26