[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: eight-bit char handling in emacs-unicode

From: Kenichi Handa
Subject: Re: eight-bit char handling in emacs-unicode
Date: Tue, 25 Nov 2003 10:07:18 +0900 (JST)
User-agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI)

In article <jwvr7zybqvr.fsf-monnier+emacs/address@hidden>, Stefan Monnier 
<address@hidden> writes:
>>  But, the concept of unibyte<->multibyte convesion itself is
>>  not ad-hoc.  Don't you think their meaning is very clear
>>  when you grasp them as my way?  Do you see any inconsistency
>>  in my explanation about them?

> No, as a matter of fact I don't see why in a utf-8 environment,
> it makes any sense to have a function that turns a multibyte string
> into a unibyte string encoded in latin-1

It seems that you keep of saying that "A does B, thus it's
nonsense".  But, I'm arguing that "A does C".

It doesn't make sense because you treat the result as "a
unibyte string encoded in Latin-1".

It makes sense if you treat the result as "a unibyte string
in which each byte represents a sequence of Unicode
code-points", doesn't it?

> (without even complaining when it encounters other
> characters).

I think it's ok (or better) that string-make-unibyte
complains in such a case.   

> It'd make sense if the environment said "latin-1 when you can,
> utf-8 otherwise" or something like that, but then we would use
> encode-coding-string anyway.

It's itself nonsense to have such a coding system.  Do you
agree with having string-make-unibyte if it signals an error
on non-Latin-1 characters?

> Besides, if any non-latin-1 char is encountered by string-make-unibyte, then
> we end up with a uninyte string that has an unknown meaning because some
> chars might have been encoded in latin-1, and others in some other encoding.

> I just don't know of a concrete case where it makes sense to use
> string-make-unibyte.

I'll paraphrase my previous example as this:

  It is perfectly possible to live in such an environment
  where only the characters U+0000..U+00FF of Unicode is
  used but only the coding system utf-8 is used.

But, I don't claim that the above is a realistic case.

Another non-realistic but concrete case is:

  Use only the charset iso-8859-5 and the encoding CTEXT.

Ken'ichi HANDA

reply via email to

[Prev in Thread] Current Thread [Next in Thread]