[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: setenv -> locale-coding-system cannot handle ASCII?!

From: Kenichi Handa
Subject: Re: setenv -> locale-coding-system cannot handle ASCII?!
Date: Wed, 26 Feb 2003 14:32:16 +0900 (JST)
User-agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.2.92 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI)

In article <address@hidden>, "Stefan Monnier" <monnier+gnu/address@hidden> 
> I consider this context-dependent meaning of unibyte strings
> to be a problem.  I understand why text in a unibyte buffer
> has such an ambiguous meaning and agree that it's difficult
> to avoid, but it's not a reason to carry over this difficulty
> to strings where it is not needed.

Why is it not needed?  Strings and buffers are not that
different, both are containers of characters.  If we get a
unibyte string from a unibyte buffer by buffer-substring,
how should we treat that string?

>>  In the former case, as it is given to encode-coding-string,
>>  it is a multibyte form by which emacs represents
>>  character(s), not a sequence of characters representing raw
>>  bytes.

> The problem is that the multibyteness of strings is not
> always as easy to guess/control.

I agree.

> For example: what is the multibyteness of

>       (concat "\201" (format "%s" "hello"))
> and
>       (concat "\201" (format "%s" 1))

The latter yields multibyte, but I think it'a bug.  I found
that "(format "%s" 1)" is implemented by using
prin1-to-string, and prin1-to-string prints an object to a
temporary buffer and gets that buffer string.  So, in a
multibyte sesstion "(format "%s" 1)" yields a multibyte
string.  :-(

>>  In the latter case, as it is given to string-to-multibyte,
>>  it should be regard as a sequence of characters representing
>>  raw bytes, thus the result of (string-to-multibyte
>>  "\201\300") is still a sequence of raw-bytes.  Encoding
>>  raw-bytes should yield the same raw-bytes.

> Indeed, that's what I and `setenv' would want.

>>  And, this behaviour of encode-coding-string on a unibyte
>>  string is a natural consequence of encode-coding-region in a
>>  unibyte buffer.

> As mentioned above, I understand why it works that way in buffers,
> but I don't think it has to work the same way for strings.

So, do you mean that you want this?

    If a unibyte buffer has \201\300 in the region FROM and TO,

    (encode-coding-string (buffer-substring FROM TO) 'iso-latin-1)
        => "\201\300"

    (encode-coding-region FROM TO 'iso-latin-1) changes the
    region to \300.

Isn't it more confusing?

By the way, I also really really hate this unibyte/mulitbyte
problem.  Sometimes I think I should have opposed to the
introduction of such a concept more strongly.

    imagine there's no unibyte 
    it's easy if you try
    no bytes below us
    above us only chars
    imagine all the people living in multibyte


Ken'ichi HANDA

reply via email to

[Prev in Thread] Current Thread [Next in Thread]