[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

builtin printf behaves incorrectly with "c and 'c character-value argume

From: Rich Felker
Subject: builtin printf behaves incorrectly with "c and 'c character-value arguments
Date: Thu, 1 Nov 2007 05:25:53 -0400
User-agent: Mutt/

$ printf %d\\n \'À
(expected 192)

This should be 192 regardless of locale on any system where wchar_t
values are ISO-10646/Unicode. Bash is incorrectly reading the first
byte of the UTF-8 which happens to be -61 when interpreted as signed
char; on a Latin-1 based locale it will probably give -63 instead.

Both POSIX and common sense are clear that the numeric values
resulting from 'c should be the wchar_t value of c and not the value
of the first byte of the multibyte character; from the SUSv3 printf(1)

     Note that in a locale with multi-byte characters, the value of a
     character is intended to be the value of the equivalent of the
     wchar_t representation of the character as described in the
     System Interfaces volume of IEEE Std 1003.1-2001.

Language lawyers could argue that on 'single-byte' locales perhaps the
byte value should be used; however, strictly speaking a single-byte
locale is simply a special case of a multi-byte one, and sanity should
win in any case.

Fixing the issue should be easy; asciicode() in builtins/printf.def
simply needs to be changed to decode the character with mbrtowc rather
than reading the byte (and perhaps also should be renamed...).


reply via email to

[Prev in Thread] Current Thread [Next in Thread]