[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: setenv -> locale-coding-system cannot handle ASCII?!

From: Stefan Monnier
Subject: Re: setenv -> locale-coding-system cannot handle ASCII?!
Date: Wed, 26 Feb 2003 00:50:27 -0500

> > I consider this context-dependent meaning of unibyte strings
> > to be a problem.  I understand why text in a unibyte buffer
> > has such an ambiguous meaning and agree that it's difficult
> > to avoid, but it's not a reason to carry over this difficulty
> > to strings where it is not needed.
> Why is it not needed?  Strings and buffers are not that
> different, both are containers of characters.

They are used differently.  Operations on strings generally apply to the
whole string: you can only encode/decode a whole string at a time.

> If we get a unibyte string from a unibyte buffer by buffer-substring,
> how should we treat that string?

Like any other unibyte string: as a sequence of raw bytes.
If you want to treat it as a sequence of characters, then
you need to pass it through `string-as-multibyte'.

In buffers, there is sometimes a need to represent multibyte chars
inside a unibyte buffer because only part of the buffer is
decoded.  For a string, that can be avoided.  You can make sure
that if it is decoded it's a multibyte string and if it's not
then it's a unibyte string.

> > For example: what is the multibyteness of
> >     (concat "\201" (format "%s" "hello"))
> > and
> >     (concat "\201" (format "%s" 1))
> The latter yields multibyte, but I think it'a bug.  I found
> that "(format "%s" 1)" is implemented by using
> prin1-to-string, and prin1-to-string prints an object to a
> temporary buffer and gets that buffer string.  So, in a
> multibyte sesstion "(format "%s" 1)" yields a multibyte
> string.  :-(

I know: I bumped into it yesterday while playing around with tar-mode.
How about the attached patch ?

> So, do you mean that you want this?
>     If a unibyte buffer has \201\300 in the region FROM and TO,
>     (encode-coding-string (buffer-substring FROM TO) 'iso-latin-1)
>       => "\201\300"
>     (encode-coding-region FROM TO 'iso-latin-1) changes the
>     region to \300.

Yes, I guess I'd be happy with it.

> Isn't it more confusing?

Not to me.

> By the way, I also really really hate this unibyte/mulitbyte
> problem.  Sometimes I think I should have opposed to the
> introduction of such a concept more strongly.

But it's pretty damn handy for binary data.


PS: I wish there was a way to swap two buffers's content so that
    tar-mode could swap the (potentially very large) data to
    a helper buffer (without needing to copy this large data)
    and then use multibyte for the display and unibyte for
    the helper buffer.

Index: print.c
RCS file: /cvsroot/emacs/emacs/src/print.c,v
retrieving revision 1.184
diff -u -r1.184 print.c
--- print.c     4 Feb 2003 14:03:13 -0000       1.184
+++ print.c     26 Feb 2003 05:43:26 -0000
@@ -774,9 +774,12 @@
   /* Make Vprin1_to_string_buffer be the default buffer after PRINTFINSH */
   set_buffer_internal (XBUFFER (Vprin1_to_string_buffer));
+  if (ZV == ZV_BYTE)
+    Fset_buffer_multibyte (Qnil);
   object = Fbuffer_string ();
   Ferase_buffer ();
+  Fset_buffer_multibyte (Qt);
   set_buffer_internal (old);
   Vdeactivate_mark = tem;

reply via email to

[Prev in Thread] Current Thread [Next in Thread]