emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Compiling Elisp to a native code with a GCC plugin


From: Eli Zaretskii
Subject: Re: Compiling Elisp to a native code with a GCC plugin
Date: Fri, 17 Sep 2010 22:57:13 +0200

> From: "Stephen J. Turnbull" <address@hidden>
> Date: Sat, 18 Sep 2010 03:53:27 +0900
> Cc: address@hidden
> 
> Actually, there's an exceptional case: if both strings are pure ASCII.
> In that case it might be possible that one string is multibyte and the
> other unibyte, while the numbers of characters and of bytes are equal.

A unibyte string in Emacs has its `size_byte' member set to a negative
value:

    /* Mark STR as a unibyte string.  */
    #define STRING_SET_UNIBYTE(STR)  \
      do { if (EQ (STR, empty_multibyte_string))  \
          (STR) = empty_unibyte_string;  \
        else XSTRING (STR)->size_byte = -1; } while (0)

By contrast, a multibyte string holds there the number of bytes in its
internal representation.  So a pure ASCII string could be unibyte or
multibyte, and the `size_byte' member will be negative in the former
case and positive in the latter case.

However, AFAIK Emacs always makes a unibyte string if all the
characters are pure ASCII.  So this does not matter in practice.

> The example you gave proves nothing, however.  In fact, when that
> string is presented by `string-as-multibyte', ?\351 will be converted
> to a private space character in Unicode and therefore will have more
> than one byte in its representation.  Thus the length in bytes of the
> string (as multibyte) will be 7 (or maybe more, I forget which private
> space naked bytes live in).  Here's one way to get byte length of a
> string:
> 
> (defun string-byte-count (s)
>   (length (if (string-multibyte-p s) (encode-coding-string s 'utf-8) s)))

See above: this is not accurate.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]