bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: INT_STRLEN_BOUND and locales with *printf


From: Eric Blake
Subject: Re: INT_STRLEN_BOUND and locales with *printf
Date: Tue, 08 Feb 2011 14:11:03 -0700
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101209 Fedora/3.1.7-0.35.b3pre.fc14 Lightning/1.0b3pre Mnenhy/0.8.3 Thunderbird/3.1.7

On 02/08/2011 01:52 PM, Ben Pfaff wrote:
> Paul Eggert <address@hidden> writes:
> 
>> On 02/08/11 12:34, Ben Pfaff wrote:
>>> The INT_STRLEN_BOUND macro in Gnulib's intprops.h calculates the
>>> maximum number of bytes in a formatted integer, on the basis that
>>> the minus sign and each digit will occupy one byte.  If *printf
>>> is used for formatting integers, is this a good assumption
>>> outside of the C locale?
>>
>> Yes and no.  It's safe for %d, but it's not safe for arbitrary
>> formats.  This is true even in the C locale; for example, %1000d
>> is not safe for INT_STRLEN_BOUND.  Any code that uses
>> INT_STRLEN_BOUND with weird formats like %Id or %'d or %1000d
>> is busted and should get fixed.
> 
> Thanks.  I was assuming a plain format such as %d.
> 
> Does your answer come from experience with many implementations,
> or is it based on knowledge of some document or standard?  I'd
> like to be able to know why the answer is true, if I can.

POSIX requires %d to produce only digits from the portable character set
for all locales, as well as using only the '-', '+', and ' ' from the
portable character set when sign or padding is involved (7-bit ASCII is
a superset of the portable character set; basically, the portable
character set includes all ascii letters, digits, and printable symbols,
as well as newline and NUL, but omits other control characters like ESC).

POSIX also requires that characters in the portable character set occupy
only one byte across all locales supported by an implementation (even if
not all supported locales have the same encoding); and only requires
that a subset of the portable character set must have the same encoding
across all locales (NUL and '/' are in this subset thanks to path name
resolution, but digits are not).  POSIX also requires that the ten
digits will be contiguous starting at '0' in all locales (even if they
are not encoded to the same values across all locales).

Therefore, you can only run into width problems with explicit width,
with %Id, or with %'d; and even then, the latter two can only happen
outside the C locale, and an explicit width is something that you can
easily deal with.

Theoretically, you can run into encoding problems (where printf("%d",0)
gives a different byte depending on your locale), but I know of no
implementation that simultaneously supports multiple locales where the
encoding of the portable characters have different encodings (that is,
pretty much all encodings are either a superset of ASCII or of EBCDIC,
and there are no machines that support simultaneous ASCII and EBCDIC
because of the POSIX restriction that '/' must be the same encoding
across all supported locales).

-- 
Eric Blake   address@hidden    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]