[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: XeTeX encoding problem

From: Gavin Smith
Subject: Re: XeTeX encoding problem
Date: Sat, 23 Jan 2016 15:27:50 +0000

On 23 January 2016 at 03:06, Masamichi HOSODA <address@hidden> wrote:
> In XeTeX and LuaTeX, is "@documentencoding ISO-8859-1" support required?
> If so, I'll improve the patch.
> It will use byte-wise input when "@documentencoding ISO-8859-1" is used.
> However, if you want ISO-8859-1,
> you can use pdfTeX instead of XeTeX/LuaTex or you can convert to UTF-8,
> in my humble opinion.

It would be inconvenient to remember to use pdfTeX whenever you had to
process a Texinfo document in ISO-8859-1. We should process
byte-by-byte for an encoding like that, using the existing code in
texinfo.tex to do so. It isn't perfect, as you say: for example, it
looks like we couldn't include another Texinfo file the filename of
which was in a single-byte encoding, but that's better than breaking
it altogether.
> I want Unicode which contains CJK characters. Not only ISO-8859-1.
> In byte-wise input, CJK characters can not be used.

Have you ever got the CJK characters to work in a Texinfo file with
XeTeX or LuaTeX? If so, maybe we should conditionally load the fonts
that you got to work. Can you satisfactorily typeset Japanese text
with XeTeX without the use of LaTeX packages? If not, it very likely
won't be practical to implement special rules for typesetting Japanese
in Texinfo itself.

>> I don't see the problem with Unicode filenames: files are named with a
>> series of bytes; does this mean that XeTeX (or LuaTeX?) has problems
>> accessing files with names which aren't in UTF-8?

> In native Unicode, word sequence 0x0066 0x00FC 0x0072
> is converted to UTF-8 byte sequence 0x66 0xC3 0xBC 0x72.
> It means "Für", then filename "Für" can be handled.
> In byte-wise input, word sequence 0x0066 0x00C3 0x00BC 0x0072
> is converted to byte sequence 0x66 0xC3 0x83 0xC2 0xBC 0x72.
> It does not mean "Für", then filename "Für" can not be handled.

Thank you for the thorough explanation; it appears that the native
support for reading files by UTF-8 sequence (instead of by byte) needs
to be used for opening files with non-ASCII filenames.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]