[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: setenv -> locale-coding-system cannot handle ASCII?!

From: Stefan Monnier
Subject: Re: setenv -> locale-coding-system cannot handle ASCII?!
Date: Tue, 25 Feb 2003 21:52:18 -0500

> >>    (if (multibyte-string-p variable)
> >>        (setq variable (encode-coding-string variable 
> >> locale-coding-system)))
> >>  
> >>  multibyte-string-p is mandatory because encode-coding-string
> >>  will change the byte-sequence of `variable' even if it is
> >>  unibyte.
> >>  Ex. (encode-coding-string "\201\300" 'iso-latin-1) => "\300"
> > I find this behavior annoying because it makes the emacs-mule
> > encoding appear in a situation where it is not mentioned.
> > I wish that
> >     (encode-coding-string "\201\300" 'iso-latin-1)
> > and
> >     (encode-coding-string (string-to-multibyte "\201\300") 'iso-latin-1)
> > returned the same value.
> Why?  As I wrote before, what does bytes of unibyte string
> means depends on a context.

I consider this context-dependent meaning of unibyte strings
to be a problem.  I understand why text in a unibyte buffer
has such an ambiguous meaning and agree that it's difficult
to avoid, but it's not a reason to carry over this difficulty
to strings where it is not needed.

> In the former case, as it is given to encode-coding-string,
> it is a multibyte form by which emacs represents
> character(s), not a sequence of characters representing raw
> bytes.

The problem is that the multibyteness of strings is not
always as easy to guess/control.  For example: what is the
multibyteness of

        (concat "\201" (format "%s" "hello"))
        (concat "\201" (format "%s" 1))

> In the latter case, as it is given to string-to-multibyte,
> it should be regard as a sequence of characters representing
> raw bytes, thus the result of (string-to-multibyte
> "\201\300") is still a sequence of raw-bytes.  Encoding
> raw-bytes should yield the same raw-bytes.

Indeed, that's what I and `setenv' would want.

> And, this behaviour of encode-coding-string on a unibyte
> string is a natural consequence of encode-coding-region in a
> unibyte buffer.

As mentioned above, I understand why it works that way in buffers,
but I don't think it has to work the same way for strings.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]