[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: eight-bit char handling in emacs-unicode

From: Kenichi Handa
Subject: Re: eight-bit char handling in emacs-unicode
Date: Mon, 1 Dec 2003 09:43:23 +0900 (JST)
User-agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI)

In article <jwvad6hlwu1.fsf-monnier+emacs/address@hidden>, Stefan Monnier 
<address@hidden> writes:
>>>  I can't answer this question without knowing the answer to my question:
>>>  what is string-make-unibyte used for.

>>  It is used for converting a multibyte string to unibyte
>>  before it is inserted in a unibyte buffer.

> I meant `what is "converting from multibyte to unibyte" used for'.
> I.e. it can be used for different things in different contexts and I can't
> answer in general, so I need a concrete case.

It is used for not loosing information about text even if
you kill a text in a multibyte buffer and paste it in a
unibyte buffer.  When you kill the just pasted text of a
unibyte buffer and paste it in the original multibyte
buffer, you recover the same character sequence.

Anyway, I already showed you this example:

  In Latin-2 environment but the default encoding is CTEXT.

In that case also, inserting multibyte latin-2 string in
unibyte buffer works the same way as in this case:

  In Latin-2 environment and the default environment is iso-latin-2.

And, that's because the functionality of string-make-unibyte
doesn't have to know about coding system.  All it has to
know is which character set to use.

If you can't answer in general, please answer to this
concrete question.

  In Latin-2 environment where one's primary character set
  is latin-iso8859-2 but the default encoding is CTEXT, how
  to make insertion of a multibyte string (containing only
  latin-iso8859-2 characters) in a unibyte buffer work with
  your method?  Such an insertion may happen when a user
  kill a text in a multibyte buffer and yank it in a unibyte

>>  It's an ambiguous statement.  Which are you sauing?

>>  Replace string-make-unibyte by:
>>  (1) encode-coding-string or make-string-unibyte.

>>  (2) a code that applies encode-coding-string or
>>  make-string-unibyte to the whole string depending on
>>  something (perhaps on the input string?).

>>  (3) a code that applies encode-coding-string to substrings
>>  where that is appropriate, and applies make-string-unibyte
>>  to the remaing substrings.

>>  (4) something that I still don't understand.

> I'm saying that each *call* to string-make-unibyte can be replaced
> by a call to either encode-coding-string or make-string-unibyte.

> But the decision of which to use and which coding-system to use
> depends on the context.

Are you talking about the actual Emacs Lisp codes that
explicitely call make-string-unibyte?  I've been talking
about the functionality of make-string-unibyte itself,
especially about the implicit call to the C function
copy_text that does the same thing as make-string-unibyte.
Is that the reason why it seems that we are talking at corss

> Now why would we want to do the work of changing all those calls?
> Because all those that would use encode-coding-string are incorrect
> in using string-make-unibyte because they won't do the right thing
> in some language environments.

What is the right thing to do when a multibyte Japanese text
is being pasted into a unibyte buffer?

I think signalling an error is the only right thing, and
I've never objected to make copy_text and
Fstring_make_unibyte signal an error in such a case.

Ken'ichi HANDA

reply via email to

[Prev in Thread] Current Thread [Next in Thread]