emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: eight-bit char handling in emacs-unicode


From: Kenichi Handa
Subject: Re: eight-bit char handling in emacs-unicode
Date: Sun, 23 Nov 2003 16:30:49 +0900 (JST)
User-agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI)

In article <jwvoev4ufqd.fsf-monnier+emacs/address@hidden>, Stefan Monnier 
<address@hidden> writes:

>>>>  It is perfectly possible to live in such an environment
>>>>  where only the charset iso-8859-1 is used but only the
>>>>  coding system utf-8 is used.  In this environment, the
>>>>  results of encode-coding-string and string-make-unibyte are
>>>>  of course not the same, but still both operations are
>>>>  meaningful.

>>>  I see that encode-coding-string does the utf-8 encoding, but what
>>>  does string-make-unibyte do in such a case and what is it used for ?

>>  It gets iso-8859-1 code-points of all characters in a
>>  multibyte string and concatenate them (the same as what is
>>  does in latin-1 lang. env.).

> You mean it does the same as (encode-coding-string str 'latin-1) ?

Not exactly the same when STR contains, for instance,
Cyrillic characters.  How to deal with unsupported
characters differs in operations.  Encode-coding-string may
behave leniently so that the result can be decoded back
correctly (perhaps by adding some escape sequence).  But,
string-make-unibyte should never change the number of
charaters.  And,

> Then why use string-make-unibyte ?

There's no way to know that we should use the coding-system
latin-1 in this situation.  All we know is that the default
coding-system is utf-8, and the default character set is
iso-8859-1.

>>  Please try C-x C-m L utf-8 RET and see how
>>  string-make-unibyte and string-make-multibyte work.

> I'll try that, but I'd like to understand the motivation for making it work
> the way it works.  I've always understood those two as "trying to DTRT" in
> a very ad-hoc way such that people that used to work in an 8bit non-ASCII
> environment don't need to worry about coding-systems and still have
> things working mostly correctly.

Doing unibyte<->multibyte conversion automatically
may be an ad-hoc way.  The way how they work for unsupported
characters may also be an ad-hoc way.

But, the concept of unibyte<->multibyte convesion itself is
not ad-hoc.  Don't you think their meaning is very clear
when you grasp them as my way?  Do you see any inconsistency
in my explanation about them?

---
Ken'ichi HANDA
address@hidden





reply via email to

[Prev in Thread] Current Thread [Next in Thread]