[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: UCS-2BE
From: |
Kenichi Handa |
Subject: |
Re: UCS-2BE |
Date: |
Fri, 01 Sep 2006 21:26:59 +0900 |
User-agent: |
SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/22.0.50 (i686-pc-linux-gnu) MULE/5.0 (SAKAKI) |
Thank you for the info!
In article <address@hidden>, YAMAMOTO Mitsuharu <address@hidden> writes:
> "Unicode Technical Report #17, Character Encoding Model"
> (http://www.unicode.org/reports/tr17/index.html) says:
[...]
> Examples of Unicode Character Encoding Schemes:
[...]
> Unicode 1.1 had three character encoding schemes: UTF-8, UCS-2BE,
> and UCS-2LE, although the latter two were not named that way at
> the time.
Ah! So here we can see the term "UCS-2BE" as CES. But how
it was defined? (I don't have Unicode 1.1)
> I suspect "UCS-2BE" is just a customary name and not explicitly
> defined even in ISO/IEC 10646.
> "UTF-8 and Unicode FAQ" (http://www.cl.cam.ac.uk/~mgk25/unicode.html)
> says:
> No endianess is implied by the encoding names UCS-2, UCS-4, UTF-16,
> and UTF-32, though ISO 10646-1 says that Bigendian should be
> preferred unless otherwise agreed. It has become customary to
> append the letters "BE" (Bigendian, high-byte first) and "LE"
> (Littleendian, low-byte first) to the encoding names in order to
> explicitly specify a byte order.
I don't know how much authorized this page is, but it also
says:
A full featured character encoding converter will have
to provide the following 13 encoding variants of Unicode
and UCS:
UCS-2, UCS-2BE, UCS-2LE, UCS-4, UCS-4LE, UCS-4BE,
UTF-8, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE,
UTF-32LE
It seems that UCS-2BE is not a mis-label of UTF-16BE, then,
it seems that treating it as a subset (not using surrogate
pair) of UTF-16BE (as done in iconv) is the right thing.
I'll try to implement it (and others) in emacs-unicode-2.
By the way, why do people want such many variants... sigh...
---
Kenichi Handa
address@hidden