bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: EPOCHREALTIME


From: Eli Schwartz
Subject: Re: EPOCHREALTIME
Date: Thu, 19 Aug 2021 14:41:43 -0400

On 8/19/21 12:18 PM, Léa Gris wrote:
> Le 19/08/2021 à 16:41, Eli Schwartz écrivait :
>> On 8/19/21 9:41 AM, Léa Gris wrote:
> 
> 
>> The error occurs, one would imagine, during the "convert string to
>> float" stage, after parsing argv or forking to bc or whatever, but
>> *before* passing it as an argument to printf(3). Here, bash is just
>> doing good error checking -- if you used
>> strtof("3.14159265358979323844", NULL) under a fr_FR.UTF-8 locale, it
>> would silently drop everything after the ".", and you would
>> "successfully" print 3,0000, but bash reports an error message.
> 
> A programming language shall distinguish between display format and data
> format.
> 
> Locale settings are for display format and shall not interfere with
> arguments parsing which is data format, or it create such data
> portability issues.


Whether you are right or wrong is a matter which I'm sublimely
indifferent to.

You seem to have missed the point of my statement, which is that this is
not about bash at all, and if you have an issue here, you should take it
up with the standards body.

The strtof() function of the C programming language is violating your
directive.


> This is exactly how I read the note from the POSIX documentation:
> 
> https://pubs.opengroup.org/onlinepubs/9699919799/utilities/bc.html#tag_20_09_16
> 
> 
>>   The bc utility always uses the <period> ( '.' ) character to represent
>>   a radix point, regardless of any decimal-point character specified as
>>   part of the current locale. In languages like C or awk, the <period>
>>   character is used in program source, so it can be portable and
>>   unambiguous, while the locale-specific character is used in input and
>>   output. Because there is no distinction between source and input in
>>   bc, this arrangement would not be possible. Using the locale-specific
>>   character in bc's input would introduce ambiguities into the language
> 
> Especially:
> 
> 
>> In languages like C or awk, the <period> character is used in program
>> source, so it can be portable and unambiguous


I challenge your reading of the POSIX documentation. The important part
is this:

> Because of such ambiguities, the <period> character is used in input.
> Having input follow different conventions from output would be
> confusing in either pipeline usage or interactive usage, so the
> <period> is also used in output.

POSIX acknowledges that using a comma in the output would be "correct
and follow our own rules, but since it's confusing we won't do it".


POSIX does NOT have similar logic in the documentation for printf, which
*mandates*

> The argument operands shall be treated as strings if the corresponding
> conversion specifier is b, c, or s, and shall be evaluated as if by
> the strtod() function if the corresponding conversion specifier is a,
> A, e, E, f, F, g, or G. Otherwise, they shall be evaluated as
> unsuffixed C integer constants, as described by the ISO C standard,
> with the following extensions:

and, indeed, strtof / strtod("3.14159265358979323844", NULL) in a French
locale is, as I originally explained, going to inherently follow
LC_NUMERIC no matter how silly you might think that is, because that's
how strtod/strtof work.

And you cannot pass a preprocessed float/double to the printf builtin,
because you don't have types, and therefore POSIX specifies how to
interpret it starting from a string.


> printf arguments are program source even if argument comes from a variable.
> 
> All things considered, if you are using floating-point numbers in a
> shell script, you are clearly not using the right tool for the job, but
> sometimes options are limited or not under your control.
> 
> Having a feature implemented in such a way, *that it cannot be used
> reliably or requires heavy work-around* (especially if you both need to
> process floating-point data in a portable format, and display in locale
> format)… is just calling for frustration and sorry errors:
> 
> For the record:
> 
> ash -c 'LC_ALL=fr_FR.utf8; printf "Pi: %2.4f\\n" "$(echo "4*a(1)" | bc
> -l)"'
> Pi: 3.1416
> 
> bash -c 'LC_ALL=fr_FR.utf8; printf "Pi: %2.4f\\n" "$(echo "4*a(1)" | bc
> -l)"'
> bash: line 1: printf: 3.14159265358979323844: invalid number
> Pi: 3,0000
> 
> dash -c 'LC_ALL=fr_FR.utf8; printf "Pi: %2.4f\\n" "$(echo "4*a(1)" | bc
> -l)"'
> Pi: 3.1416
> 
> ksh -c 'LC_ALL=fr_FR.utf8; printf "Pi: %2.4f\\n" "$(echo "4*a(1)" | bc
> -l)"'
> Pi: ksh: printf: 3.14159265358979323844: arithmetic syntax error
> ksh: printf: 3.14159265358979323844: arithmetic syntax error
> ksh: printf: warning: invalid argument of type f
> 3,0000
> 
> mksh -c 'LC_ALL=fr_FR.utf8; printf "Pi: %2.4f\\n" "$(echo "4*a(1)" | bc
> -l)"'
> Pi: 3,1416
> 
> zsh -c 'LC_ALL=fr_FR.utf8; printf "Pi: %2.4f\\n" "$(echo "4*a(1)" | bc
> -l)"'
> Pi: 3,1416


So you are saying only bash and ksh are POSIX-conformant. Okay, I
believe you. Would you like to report the POSIX conformance bug reports
to ash, dash, mksh, and zsh?

ash and dash have the additional intriguing quirk that they don't
respect LC_NUMERIC at all, incidentally.

mksh and zsh are somehow intentionally supporting both?  Admittedly,
both document that they aren't actually trying to be a POSIX shell, just
mostly one (mksh refers to "objectionable POSIX-mandated behaviour"),
though I'd expect lksh, "legacy KSH for posix users", to act like POSIX
and that doesn't either.

(You may feel free to use  / mksh for its "posix except when we think
our behavior is better" policy. Please understand this isn't a bug in
bash, and also bash is the wrong place to complain about the behavior.)



$ export LC_ALL=fr_FR.utf8

$ busybox sh -c 'printf "Pi: %2.4f\n" 3,14159265358979323844'
Pi: sh: invalid number '3,14159265358979323844'
0.0000

(expected)
$ bash -c 'printf "Pi: %2.4f\n" 3,14159265358979323844'
Pi: 3,1416

$ dash -c 'printf "Pi: %2.4f\n" 3,14159265358979323844'
dash: 1: printf: 3,14159265358979323844: not completely converted
Pi: 3.0000

(expected)
$ ksh -c 'printf "Pi: %2.4f\n" 3,14159265358979323844'
Pi: 3,1416

$ mksh -c 'printf "Pi: %2.4f\n" 3,14159265358979323844'
Pi: 3,1416

$ zsh -c 'printf "Pi: %2.4f\n" 3,14159265358979323844'
Pi: 3,1416


And, for bonus points,

$ /bin/printf --version
printf (GNU coreutils) 8.32
$ export LC_ALL=fr_FR.utf8 POSIXLY_CORRECT=1
$ /bin/printf 'Pi: %2.4f\n' 3.14159265358979323844
Pi: 3,1416


So that's fun too.


-- 
Eli Schwartz
Arch Linux Bug Wrangler and Trusted User

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]