bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: XeTeX encoding problem


From: Masamichi HOSODA
Subject: Re: XeTeX encoding problem
Date: Mon, 18 Jan 2016 00:27:22 +0900 (JST)

> Instead, I would like to have the ucharclasses style file (for XeTeX)
> ported to texinfo (also part of TeXLive, BTW).
> 
>   https://github.com/Pomax/ucharclasses
> 
> It should also be ported to luatex so that Unicode blocks
> automatically access associated fonts.
> 
> But this is the future.  Right now, I favor a simple solution, namely
> native UTF8 support using the CM super fonts, even if there are
> missing characters (which ones, BTW?).  I guess this covers 99% of the
> current need.

I have another solution.
The sample patch is attached to this mail.

Unicode fonts are not required. (default Computer Modern is used.)
Byte wise input is *NOT* used.
Unicode glyphs (U+00FC etc.) can be used.

How about this?
--- texinfo.tex.org     2016-01-15 07:41:42.861186100 +0900
+++ texinfo.tex 2016-01-18 00:11:11.797800800 +0900
@@ -9428,45 +9428,18 @@
   \global\righthyphenmin = #3\relax
 }
 
-% Get input by bytes instead of by UTF-8 codepoints for XeTeX and LuaTeX, 
-% otherwise the encoding support is completely broken.
-\ifx\XeTeXrevision\thisisundefined
-\else
-\XeTeXdefaultencoding "bytes"  % For subsequent files to be read
-\XeTeXinputencoding "bytes"  % Effective in texinfo.tex only
-% Unfortunately, there seems to be no corresponding XeTeX command for
-% output encoding.  This is a problem for auxiliary index and TOC files.
-% The only solution would be perhaps to write out @U{...} sequences in
-% place of UTF-8 characters.
-\fi
+\newif\iftxinativeunicodecapable
 
-\ifx\luatexversion\thisisundefined
+\ifx\XeTeXrevision\thisisundefined
+  \ifx\luatexversion\thisisundefined
+    \txinativeunicodecapablefalse
+  \else
+    \txinativeunicodecapabletrue
+  \fi
 \else
-\directlua{
-local utf8_char, byte, gsub = unicode.utf8.char, string.byte, string.gsub
-local function convert_char (char)
-  return utf8_char(byte(char))
-end
-
-local function convert_line (line)
-  return gsub(line, ".", convert_char)
-end
-
-callback.register("process_input_buffer", convert_line)
-
-local function convert_line_out (line)
-  local line_out = ""
-  for c in string.utfvalues(line) do
-     line_out = line_out .. string.char(c)
-  end
-  return line_out
-end
-
-callback.register("process_output_buffer", convert_line_out)
-}
+  \txinativeunicodecapabletrue
 \fi
 
-
 % Helpers for encodings.
 % Set the catcode of characters 128 through 255 to the specified number.
 %
@@ -9491,13 +9464,6 @@
 %
 \def\documentencoding{\parseargusing\filenamecatcodes\documentencodingzzz}
 \def\documentencodingzzz#1{%
-  % Get input by bytes instead of by UTF-8 codepoints for XeTeX,
-  % otherwise the encoding support is completely broken.
-  % This settings is for the document root file.
-  \ifx\XeTeXrevision\thisisundefined
-  \else
-    \XeTeXinputencoding "bytes"
-  \fi
   %
   % Encoding being declared for the document.
   \def\declaredencoding{\csname #1.enc\endcsname}%
@@ -9526,10 +9492,12 @@
      \latninechardefs
   %
   \else \ifx \declaredencoding \utfeight
-     \setnonasciicharscatcode\active
-     % since we already invoked \utfeightchardefs at the top level
-     % (below), do not re-invoke it, then our check for duplicated
-     % definitions triggers.  Making non-ascii chars active is enough.
+     \iftxinativeunicodecapable
+       \nativeunicodechardefs
+     \else
+       \setnonasciicharscatcode\active
+       \utfeightchardefs
+     \fi
   %
   \else
     \message{Ignoring unknown document encoding: #1.}%
@@ -9859,7 +9827,7 @@
   \catcode`\;=12
   \catcode`\!=12
   \catcode`\~=13
-  \gdef\DeclareUnicodeCharacter#1#2{%
+  \gdef\DeclareUnicodeCharacterUTFviii#1#2{%
     \countUTFz = "#1\relax
     %\wlog{\space\space defining Unicode char U+#1 (decimal \the\countUTFz)}%
     \begingroup
@@ -9917,6 +9885,21 @@
     \uppercase{\gdef\UTFviiiTmp{#2#3#4}}}
 \endgroup
 
+\def\DeclareUnicodeCharacterNative#1#2{%
+  \catcode"#1=\active
+  \begingroup
+    \uccode`\~="#1\relax
+    \uppercase{\gdef~}{#2}%
+  \endgroup}
+
+\def\DeclareUnicodeCharacterNativeCatcodeActive#1#2{%
+  \catcode"#1=\active
+}
+
+\def\DeclareUnicodeCharacterNativeCatcodeOther#1#2{%
+  \catcode"#1=\other
+}
+
 % https://en.wikipedia.org/wiki/Plane_(Unicode)#Basic_M
 % U+0000..U+007F = https://en.wikipedia.org/wiki/Basic_Latin_(Unicode_block)
 % U+0080..U+00FF = 
https://en.wikipedia.org/wiki/Latin-1_Supplement_(Unicode_block)
@@ -9931,7 +9914,7 @@
 % We won't be doing that here in this simple file.  But we can try to at
 % least make most of the characters not bomb out.
 %
-\def\utfeightchardefs{%
+\def\unicodechardefs{%
   \DeclareUnicodeCharacter{00A0}{\tie}
   \DeclareUnicodeCharacter{00A1}{\exclamdown}
   \DeclareUnicodeCharacter{00A2}{{\tcfont \char162}}% 0242=cent
@@ -10601,7 +10584,33 @@
 
   \global\mathchardef\checkmark="1370 % actually the square root sign
   \DeclareUnicodeCharacter{2713}{\ensuremath\checkmark}
-}% end of \utfeightchardefs
+}% end of \unicodechardefs
+
+\def\utfeightchardefs{
+  \let\DeclareUnicodeCharacter\DeclareUnicodeCharacterUTFviii
+  \unicodechardefs
+}
+
+\def\nativeunicodechardefs{
+  \iftxinativeunicodecapable
+    \let\DeclareUnicodeCharacter\DeclareUnicodeCharacterNative
+    \unicodechardefs
+  \fi
+}
+
+\def\setnativeunicodecharscatcodeactive{
+  \iftxinativeunicodecapable
+    \let\DeclareUnicodeCharacter\DeclareUnicodeCharacterNativeCatcodeActive
+    \unicodechardefs
+  \fi
+}
+
+\def\setnativeunicodecharscatcodeother{
+  \iftxinativeunicodecapable
+    \let\DeclareUnicodeCharacter\DeclareUnicodeCharacterNativeCatcodeOther
+    \unicodechardefs
+  \fi
+}
 
 % US-ASCII character definitions.
 \def\asciichardefs{% nothing need be done
@@ -10610,6 +10619,7 @@
 
 % Latin1 (ISO-8859-1) character definitions.
 \def\nonasciistringdefs{%
+  \setnativeunicodecharscatcodeother
   \setnonasciicharscatcode\active
   \def\defstringchar##1{\def##1{\string##1}}%
   %
@@ -10654,11 +10664,6 @@
   \defstringchar^^fc\defstringchar^^fd\defstringchar^^fe\defstringchar^^ff%
 }
 
-
-% define all the unicode characters we know about, for the sake of @U.
-\utfeightchardefs
-
-
 % Make non-ASCII characters printable again for compatibility with
 % existing Texinfo documents that may use them, even without declaring a
 % document encoding.
\input texinfo.tex

@documentencoding UTF-8

@contents

@chapter für

für

@bye

reply via email to

[Prev in Thread] Current Thread [Next in Thread]