[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: UTF-8 conversion problem in Texinfo 6.6 with TEXINFO_XS_PARSER
From: |
Eli Zaretskii |
Subject: |
Re: UTF-8 conversion problem in Texinfo 6.6 with TEXINFO_XS_PARSER |
Date: |
Mon, 18 Feb 2019 17:36:00 +0200 |
> From: Gavin Smith <address@hidden>
> Date: Mon, 18 Feb 2019 00:33:08 +0000
> Cc: address@hidden
>
> > If you don't see the problem on your system, try doing the above in a
> > non-UTF-8 locale (e.g., a Latin-1 locale). If that doesn't succeed in
> > reproducing the problem, either, it could be Windows specific, but in
> > that case I will need your guidance for where to look.
>
> It must be a problem that only occurs in certain circumstances.
Indeed, it is triggered by @include'ing a file with @documentencoding.
> I couldn't reproduce it with the attached file, whether in a UTF-8
> locale or a Latin-1 locale.
Right, because your @documentencoding is in the file where it is
needed.
> Are you sure that @documentencoding UTF-8 is present in the file? I
> didn't clone the Emacs repository, but I couldn't see it at
> http://git.savannah.gnu.org/cgit/emacs.git/tree/doc/lispref/elisp.texi
> (unless it is from some included file).
In the Emacs manuals @documentencoding is in the include file
docstyle.texi, which elisp.texi includes.
> You could add debugging statements to the code to check what encoding
> the input is being interpreted as. For example,
Thanks, I think I see the problem. It's because the code manages
input_encoding on the input_stack. Which means each included file
starts up with input_encoding of zero (which happens to stand for
latin-1), and when reading of the include file is exhausted, the code
pops input_stack, so any @documentencoding set by an include file is
thrown away, and any file included after @documentencoding has its
encoding reset to latin-1. But @documentencoding is a global setting,
and once set, it should remain in effect for any stuff read
thereafter, until it is changed by another @documentencoding, or until
EOF. I think this means input_encoding should be part of global_info,
not of input_stack.
Btw, I think there's a more general issue here. It sounds like in the
absence of any @documentencoding directive, the C parser assumes
Latin-1, something that doesn't seem to be documented in the Texinfo
manual, and perhaps isn't even the best default nowadays. It means,
for example, that a document with UTF-8 encoded non-ASCII characters
but without @documentencoding will have its non-ASCII characters
"converted" on output. Is that the intended behavior, and is it
consistent with what the Perl parser does? If so, I think it should
be prominently documented, and we should perhaps consider changing the
default to UTF-8.
Thanks.
- Texinfo 6.6 released, Gavin Smith, 2019/02/16
- UTF-8 conversion problem in Texinfo 6.6 with TEXINFO_XS_PARSER, Eli Zaretskii, 2019/02/17
- Re: UTF-8 conversion problem in Texinfo 6.6 with TEXINFO_XS_PARSER, Gavin Smith, 2019/02/17
- Re: UTF-8 conversion problem in Texinfo 6.6 with TEXINFO_XS_PARSER,
Eli Zaretskii <=
- Re: UTF-8 conversion problem in Texinfo 6.6 with TEXINFO_XS_PARSER, Gavin Smith, 2019/02/19
- Re: UTF-8 conversion problem in Texinfo 6.6 with TEXINFO_XS_PARSER, Patrice Dumas, 2019/02/19
- Re: UTF-8 conversion problem in Texinfo 6.6 with TEXINFO_XS_PARSER, Eli Zaretskii, 2019/02/19
- Re: UTF-8 conversion problem in Texinfo 6.6 with TEXINFO_XS_PARSER, Patrice Dumas, 2019/02/22
- Re: UTF-8 conversion problem in Texinfo 6.6 with TEXINFO_XS_PARSER, Gavin Smith, 2019/02/22
- Re: UTF-8 conversion problem in Texinfo 6.6 with TEXINFO_XS_PARSER, Eli Zaretskii, 2019/02/19
- Re: UTF-8 conversion problem in Texinfo 6.6 with TEXINFO_XS_PARSER, Gavin Smith, 2019/02/19
- Re: UTF-8 conversion problem in Texinfo 6.6 with TEXINFO_XS_PARSER, Eli Zaretskii, 2019/02/20
- Re: UTF-8 conversion problem in Texinfo 6.6 with TEXINFO_XS_PARSER, Gavin Smith, 2019/02/19
- Re: UTF-8 conversion problem in Texinfo 6.6 with TEXINFO_XS_PARSER, Eli Zaretskii, 2019/02/19