emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fwd: Re: Inadequate documentation of silly characters on screen.


From: Alan Mackenzie
Subject: Re: Fwd: Re: Inadequate documentation of silly characters on screen.
Date: Thu, 19 Nov 2009 14:18:52 +0000
User-agent: Mutt/1.5.9i

On Thu, Nov 19, 2009 at 09:21:41PM +0800, Jason Rumney wrote:
> Andreas Schwab <address@hidden> writes:

> > Nothing gets truncated.  In Emacs 23 ?ñ is simply the number 241,
> > whereas in Emacs 22 is it the number 2289.  You can put 2289 in a
> > string in Emacs 23, but there is no defined unicode character with
> > that value.

> The bug here is likely that setting a character in a unibyte string to
> a value between 160 and 255 does not result in an automatic conversion
> to multibyte.  That was correct in 22.3, since values in that range
> were raw binary bytes outside of any character set, but in 23.1 they
> correspond to valid Latin-1 codepoints.

Putting point over the \361 and doing C-x = shows the character is 

    Char: \361 (4194289, #o17777761, #x3ffff1, raw-byte)

The actual character in the string is ñ (#x3f).

Going through all the motions, here is what I think is happening: the
\361 is put there by `insert'.

insert calls
  general_insert_function, calls
    insert_from_string (via a function pointer), calls
      insert_from_string_1, calls
        copy_text

        at this stage, I'm assuming to_multibyte (the screen buffer, in
        some form) is TRUE, and from_multibyte (a string holding the
        single character #xf1) is FALSE.  We thus execute this code in
        copy_txt:

  else
    {
      unsigned char *initial_to_addr = to_addr;

      /* Convert single-byte to multibyte.  */
      while (nbytes > 0)
        {
          int c = *from_addr++;        <==============================

          if (c >= 0200)
            {
              c = unibyte_char_to_multibyte (c);
              to_addr += CHAR_STRING (c, to_addr);
              nbytes--;
            }
          else
            /* Special case for speed.  */
            *to_addr++ = c, nbytes--;
        }
      return to_addr - initial_to_addr;
    }

        At the indicated line, c is a SIGNED integer, therefore will get
        the value 0xfffffff1, not 0xf1.

        copy_text then invokes the macro
          unibyte_char_to_multibyte (-15),

          at which point there's no point going any further.

At least, that's my guess as to what's happening.  A fix would be to
change the declaration of "int c" to "unsigned int c".  I'm going to try
that now.

-- 
Alan Mackenzie (Nuremberg, Germany).




reply via email to

[Prev in Thread] Current Thread [Next in Thread]