[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: eight-bit char handling in emacs-unicode

From: Kenichi Handa
Subject: Re: eight-bit char handling in emacs-unicode
Date: Fri, 21 Nov 2003 15:27:37 +0900 (JST)
User-agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI)

In article <jwvzneqwbo3.fsf-monnier+emacs/address@hidden>, Stefan Monnier 
<address@hidden> writes:
>>  Yes, but it doesn't mean it is conceptually the same as
>>  encode-coding-string.  The result of string-make-unibyte
>>  should still be regarded as a sequence of character, but the
>>  result of encode-coding-string is a sequence of byte.

> Why/when is the distinction meaningful (given the fact that it
> can only be used meaningfully with 8bit coding-systems where the
> distinction seems more philosophical than anything else) ?

It is perfectly possible to live in such an environment
where only the charset iso-8859-1 is used but only the
coding system utf-8 is used.  In this environment, the
results of encode-coding-string and string-make-unibyte are
of course not the same, but still both operations are

>>  Here exists an ambiguity of a unibyte string.

>>  The number 192 can be regarded as:
>>  (1) just a number, a byte
>>  (2) a code point of some character set.
>>  (3) a character code

> But the second case is only possible for 8bit character sets, right?

Yes.  But, as I wrote above, it doesn't mean that we are
restricted to simple 8bit-oriented coding-systems.

> Until now, I always thought that Emacs only dealt with
> - byte streams representing encoded sequences of code points: case 1.
> - sequences of internal character codes (internally encoded in emacs-mule
>   or unicode depending on the branch you use): case 3.
> Is there any place where we deal with sequences of code points of external
> charsets really (other than in the degenerate case where such a sequence
> is indistinguishable from case 1, maybe).

I'd like to repeat that although we don't have such an
environment now, it doesn't mean it is impossible to assume
such environment.

>>  A unibyte string can contain (1) and (2) without
>>  distinguishing them, but a multibyte string can contain (1)
>>  and (3) while distinguishing them.

> Can multibyte strings distinguish the cases (1) and (3) for integer 97 and
> character `a' ?

Good point.  Of course no.  I dared not mention that to make
the discussion simpler.

Ken'ichi HANDA

reply via email to

[Prev in Thread] Current Thread [Next in Thread]