[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: locale-dependent token separator handling doesn't work in multi-byte

From: Eric Blake
Subject: Re: locale-dependent token separator handling doesn't work in multi-byte locales
Date: Wed, 08 Oct 2014 12:07:38 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.1

On 10/08/2014 11:53 AM, Ángel González wrote:
> Eric Blake wrote:
>> On 10/08/2014 08:52 AM, Stephane Chazelas wrote:
>>> When bash parses code it honours the "blank" character class in
>>> the current locale as token separator.
>>> For instance, if "x" is a blank character in the current locale,
>> Such a locale is invalid per POSIX; but the invalidity of the locale
>> doesn't stop it from being a potential attack vector :)
> Is it? I looked at locale definition [1] but it only seems to define
> what the POSIX/C locale must be, not any restriction on what a locale
> could impose. It seems to me that a Klingon locale where everything
> outside U+F8D0 - U+F8FF [2] were considered a blank would be conformant
> (although an Earth application using such locale would hit a lot of
> undefined cases ☺).
> 1- http://pubs.opengroup.org/onlinepubs/7908799/xbd/locale.html
> 2- http://www.evertype.com/standards/csur/klingon.html


POSIX requires that ALL locales support the portable filename set of
characters as single-byte seven-bit characters (many of those characters
can ALSO occur as a byte within a multibyte character, as in Big5
encoding; at least UTF-8 is a sane encoding where no single-byte
character occurs embedded in a multibyte character).  For some
characters ('.', '/', '\n', '\r') the encoding MUST be invariant across
all supported locales.  It has a little bit of fuzz room by stating that
if two locales choose a different encoding for other characters, then
results are unspecified when crossing those locale boundaries; but in
all reality, the ONLY widely-used encoding that does not have the same
bytes as ASCII is EBCDIC, and it already picks different values for
encoding '.' and '/', so it is impossible to have a POSIX-compliant
system that simultaneously supports ASCII and EBCDIC locales.

At any rate, I read the requirements on the portable filename set as
requiring that ALL locales define 'x' as a 7-bit character, and I'm not
seeing enough flexibility in that to define a locale that puts 'x' in
the blank class.  On the other hand, a locale that abuses 8-bit
characters to be blanks in some locales and letters in others is indeed
quite possible and compliant; so while using 'x' as a blank is
questionable, the whole idea of abusing locales to cause parse
differences is not.

Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]