[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: XeTeX encoding problem
From: |
Masamichi HOSODA |
Subject: |
Re: XeTeX encoding problem |
Date: |
Sat, 16 Jan 2016 02:15:21 +0900 (JST) |
>>>> (something like ``Table of Contents'' broken etc.)
>>>>
>>>> That can be fixed in other ways, without resorting to native UTF-8.
>>>
>>> I agree.
>>
>> In the case of LuaTex, exactly, it can be fixed.
>> In the case of XeTeX, unfortunately,
>> it cannot be fixed if I understand correctly.
>
> I think it could be done by changing the active definitions of bytes
> 128-256 when writing to an auxiliary file to read a single Unicode
> character and write out an ASCII sequence that represents that
> character, probably involving the @U command. Do you know how to do
> this?
If I understand correctly, active definitions is unrelated.
In the case of native Unicode is enabled,
"Für" in UTF-8 ".tex":
letter -> ".tex"
F -> 0x66
ü -> 0xC3, 0xBC
r -> 0x72
XeTeX reads ".tex" files as native Unicode:
letter -> ".tex" -> inner XeTeX
F -> 0x66 -> U+0066
ü -> 0xC3, 0xBC -> U+00FC
r -> 0x72 -> U+0072
XeTeX writes ".toc" files in UTF-8:
letter -> ".tex" -> inner XeTeX -> ".toc"
F -> 0x66 -> U+0066 -> 0x66
ü -> 0xC3, 0xBC -> U+00FC -> 0xC3, 0xBC
r -> 0x72 -> U+0072 -> 0x72
As a result, ".tex" and ".toc" are same.
Therefore, table of contents is not broken.
On the other hand, in the case of "bytes" encoding,
XeTeX reads as following:
letter -> ".tex" -> inner XeTeX
F -> 0x66 -> U+0066
ü -> 0xC3, 0xBC -> U+00C3, U+00BC
r -> 0x72 -> U+0072
XeTeX writes ".toc" files in UTF-8 *always*.
It cannot change without something like \XeTeXoutputencoding primitive:
letter -> ".tex" -> inner XeTeX -> ".toc"
F -> 0x66 -> U+0066 -> 0x66
ü -> 0xC3, 0xBC -> U+00C3, U+00BC -> 0xC3, 0x83, 0xC2, 0xBC
r -> 0x72 -> U+0072 -> 0x72
As a result, ".tex" and ".toc" are different.
Moreover, ".toc" is broken. It cannot be repaired.
"0xC3, 0xBC" is replaced to \"u by \DeclareUnicodeCharacter etc.
It is correctly "ü".
However, "0xC3, 0x83" is replaced to \~A and
"0xC2, 0xBC" is replaced to $1\over4$.
It is not "ü".
Therefore, table of contents is broken.
I've posted a future request \XeTeXoutputencoding etc.
http://sourceforge.net/p/xetex/feature-requests/22/
>> Yes, CJK fonts are required.
>> For example, if you want to use Japanese characters,
>> I think that it is possible to set the Japanese font in txi-ja.tex.
>> However, if the native Unicode support is disabled,
>> the Japanese characters cannot be used in this way.
>
> Good idea to put the font loading in the translation files.
Thank you.
Alternatively, it may be good even if there is a font configuration file
like txi-font-latinmodern.tex, txi-font-computermodern.tex, etc.
- Re: XeTeX encoding problem, (continued)
- Re: XeTeX encoding problem, Gavin Smith, 2016/01/10
- Re: XeTeX encoding problem, Masamichi HOSODA, 2016/01/10
- Re: XeTeX encoding problem, Gavin Smith, 2016/01/11
- Re: XeTeX encoding problem, Masamichi HOSODA, 2016/01/11
- Re: XeTeX encoding problem, Gavin Smith, 2016/01/11
- Re: XeTeX encoding problem, Masamichi HOSODA, 2016/01/13
- Re: XeTeX encoding problem, Karl Berry, 2016/01/14
- Re: XeTeX encoding problem, Gavin Smith, 2016/01/15
- Re: XeTeX encoding problem, Masamichi HOSODA, 2016/01/15
- Re: XeTeX encoding problem, Gavin Smith, 2016/01/15
- Re: XeTeX encoding problem,
Masamichi HOSODA <=
- Re: XeTeX encoding problem, Gavin Smith, 2016/01/15
- Re: XeTeX encoding problem, Masamichi HOSODA, 2016/01/15
- Re: XeTeX encoding problem, Gavin Smith, 2016/01/15
- Re: XeTeX encoding problem, Karl Berry, 2016/01/15
- Re: XeTeX encoding problem, Masamichi HOSODA, 2016/01/16
- Re: XeTeX encoding problem, Gavin Smith, 2016/01/16
- Re: XeTeX encoding problem, Karl Berry, 2016/01/16
- Re: XeTeX encoding problem, Gavin Smith, 2016/01/16
- Re: XeTeX encoding problem, Werner LEMBERG, 2016/01/16
- Re: XeTeX encoding problem, Masamichi HOSODA, 2016/01/17