[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Several serious problems
From: |
Kenichi Handa |
Subject: |
Re: Several serious problems |
Date: |
Mon, 2 Sep 2002 10:28:25 +0900 (JST) |
User-agent: |
SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.1.30 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) |
In article <address@hidden>, Richard Stallman <address@hidden> writes:
> That depends on whether you include code in utf-8.el that encodes
> those charsets. If not, you need that change.
> In that case, I will install that change presently, and then we can
> study the question of whether to include the code in utf-8.el instead.
> What does that code in utf-8.el do, and how safe a change is it?
It defines two CCL codes to decode and encode utf-8 byte
sequence, and makes the coding system mule-utf-8 by using
those CCL codes.
I'll attach the necessary change to enable RC's utf-8 to
encode latin-X plus alpha (e.g. thai). The docstring of
mule-utf-8 may need improvement.
As the change is very small and that code has been in HEAD
for more than one month, I think the change is quite safe.
I recommend to install it in RC.
I also checked the code to some extent by this testsuite.
(dolist (charset (delq 'ascii
(delq 'eight-bit-control
(delq 'eight-bit-graphic
(coding-system-get 'mule-utf-8
'safe-charsets)))))
(let ((dimension (charset-dimension charset))
str)
(if (= dimension 1)
(setq str (string (make-char charset 33) (make-char charset 34)))
(setq str (string (make-char charset 33 33) (make-char charset 33 34))))
(or (memq 'mule-utf-8 (find-coding-systems-string str))
(not (string-match "\357\277\275" ; UTF-8 form of U+FFFD
(encode-coding-string str 'mule-utf-8)))
(error (format "%s is not supported" charset)))))
---
Ken'ichi HANDA
address@hidden
*** utf-8.el.~1.9.4.2.~ Tue Jul 23 13:54:13 2002
--- utf-8.el Mon Sep 2 10:28:26 2002
***************
*** 269,275 ****
(loop
(if (r5 < 0)
((r1 = -1)
! (read-multibyte-character r0 r1))
(;; We have already done read-multibyte-character.
(r0 = r5)
(r1 = r6)
--- 269,277 ----
(loop
(if (r5 < 0)
((r1 = -1)
! (read-multibyte-character r0 r1)
! (translate-character ucs-mule-to-mule-unicode r0 r1))
!
(;; We have already done read-multibyte-character.
(r0 = r5)
(r1 = r6)
***************
*** 392,397 ****
--- 394,423 ----
mule-unicode-0100-24ff
mule-unicode-2500-33ff
mule-unicode-e000-ffff
+ latin-iso8859-2 (*)
+ latin-iso8859-3 (*)
+ latin-iso8859-4 (*)
+ cyrillic-iso8859-5 (*)
+ arabic-iso8859-6 (*)
+ greek-iso8859-7 (*)
+ hebrew-iso8859-8 (*)
+ latin-iso8859-9 (*)
+ latin-iso8859-14 (*)
+ latin-iso8859-15 (*)
+ chinese-sisheng (*)
+ ethiopic (*)
+ ipa (*)
+ lao (*)
+ katakana-jisx0201 (*)
+ thai-tis620 (*)
+ tibetan (*)
+ vietnamese-viscii-lower (*)
+ vietnamese-viscii-upper (*)
+
+ Among them, the charsets labeled \"(*)\" are supported only on
+ encoding. That means, they are correctly encoded to UTF-8, but are
+ decoded back to charsets latin-iso8859-1, mule-unicode-0100-24ff, or
+ mule-unicode-2500-33ff, not to the original charsets.
Unicode characters out of the ranges U+0000-U+33FF and U+E200-U+FFFF
are decoded into sequences of eight-bit-control and eight-bit-graphic
***************
*** 409,415 ****
latin-iso8859-1
mule-unicode-0100-24ff
mule-unicode-2500-33ff
! mule-unicode-e000-ffff)
(mime-charset . utf-8)
(coding-category . coding-category-utf-8)
(valid-codes (0 . 255))))
--- 435,460 ----
latin-iso8859-1
mule-unicode-0100-24ff
mule-unicode-2500-33ff
! mule-unicode-e000-ffff
! latin-iso8859-2
! latin-iso8859-3
! latin-iso8859-4
! cyrillic-iso8859-5
! arabic-iso8859-6
! greek-iso8859-7
! hebrew-iso8859-8
! latin-iso8859-9
! latin-iso8859-14
! latin-iso8859-15
! chinese-sisheng
! ethiopic
! ipa
! lao
! katakana-jisx0201
! thai-tis620
! tibetan
! vietnamese-viscii-lower
! vietnamese-viscii-upper)
(mime-charset . utf-8)
(coding-category . coding-category-utf-8)
(valid-codes (0 . 255))))
- Re: Several serious problems, Richard Stallman, 2002/09/01
- Re: Several serious problems,
Kenichi Handa <=
- Re: Several serious problems, Dave Love, 2002/09/05
- Re: Several serious problems, Kenichi Handa, 2002/09/05
- Re: Several serious problems, Robert J. Chassell, 2002/09/06
- Re: Several serious problems, Dave Love, 2002/09/07
- Re: Several serious problems, Richard Stallman, 2002/09/08
- Re: Several serious problems, Dave Love, 2002/09/12
- Re: Several serious problems, Kenichi Handa, 2002/09/26
Re: Several serious problems, Richard Stallman, 2002/09/10