[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: locale-dependent token separator handling doesn't work in multi-byte

From: Eric Blake
Subject: Re: locale-dependent token separator handling doesn't work in multi-byte locales
Date: Wed, 08 Oct 2014 09:17:18 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.1

On 10/08/2014 08:52 AM, Stephane Chazelas wrote:
> When bash parses code it honours the "blank" character class in
> the current locale as token separator.
> For instance, if "x" is a blank character in the current locale,

Such a locale is invalid per POSIX; but the invalidity of the locale
doesn't stop it from being a potential attack vector :)

> In any case, I think the feature should either be fixed (make it
> also work in multi-byte locales), or the limitation (that it
> only works in single-byte locales) documented, or the feature
> (make the parsing locale-dependant) removed.

I would argue that locale-dependent parsing is probably a bug waiting to
happen, and would be in favor of removing the feature and forcing the
use of the C locale for the duration of parsing a script.  Yes, that
means you can't write a variable name with non-ASCII characters, but as
you've demonstrated, running such a script in a different locale than
where it was written raises too many issues about what should happen.

Java is a language that explicitly decided to allow Unicode names in
identifiers, but it still has locale issues unless you use \u escapes
when writing the file using just ASCII characters; I also find it
awkward that the set of valid Java identifiers increases every time a
new version of Unicode is released; it is weird that code that won't
compile today might compile tomorrow, if the only reason it didn't
compile today was the use of a Unicode codepoint that gets turned into
an alphanumeric in a newer version of the standard.  I see no reason to
have that complexity in bash.

This may also be the sort of question worth asking the Austin Group
about, to see if POSIX should be tightened on this front.

Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]