Re: locale-dependent token separator handling doesn't work in multi-byte

bug-bash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: locale-dependent token separator handling doesn't work in multi-byte

From:	Eric Blake
Subject:	Re: locale-dependent token separator handling doesn't work in multi-byte locales
Date:	Wed, 08 Oct 2014 12:07:38 -0600
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.1

On 10/08/2014 11:53 AM, Ángel González wrote:
> Eric Blake wrote:
>> On 10/08/2014 08:52 AM, Stephane Chazelas wrote:
>>> When bash parses code it honours the "blank" character class in
>>> the current locale as token separator.
>>>
>>> For instance, if "x" is a blank character in the current locale,
>>
>> Such a locale is invalid per POSIX; but the invalidity of the locale
>> doesn't stop it from being a potential attack vector :)
> 
> 
> Is it? I looked at locale definition [1] but it only seems to define
> what the POSIX/C locale must be, not any restriction on what a locale
> could impose. It seems to me that a Klingon locale where everything
> outside U+F8D0 - U+F8FF [2] were considered a blank would be conformant
> (although an Earth application using such locale would hit a lot of
> undefined cases ☺).
> 
> 1- http://pubs.opengroup.org/onlinepubs/7908799/xbd/locale.html
> 2- http://www.evertype.com/standards/csur/klingon.html

http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap06.html

POSIX requires that ALL locales support the portable filename set of
characters as single-byte seven-bit characters (many of those characters
can ALSO occur as a byte within a multibyte character, as in Big5
encoding; at least UTF-8 is a sane encoding where no single-byte
character occurs embedded in a multibyte character).  For some
characters ('.', '/', '\n', '\r') the encoding MUST be invariant across
all supported locales.  It has a little bit of fuzz room by stating that
if two locales choose a different encoding for other characters, then
results are unspecified when crossing those locale boundaries; but in
all reality, the ONLY widely-used encoding that does not have the same
bytes as ASCII is EBCDIC, and it already picks different values for
encoding '.' and '/', so it is impossible to have a POSIX-compliant
system that simultaneously supports ASCII and EBCDIC locales.

At any rate, I read the requirements on the portable filename set as
requiring that ALL locales define 'x' as a 7-bit character, and I'm not
seeing enough flexibility in that to define a locale that puts 'x' in
the blank class.  On the other hand, a locale that abuses 8-bit
characters to be blanks in some locales and letters in others is indeed
quite possible and compliant; so while using 'x' as a blank is
questionable, the whole idea of abusing locales to cause parse
differences is not.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

[Prev in Thread]

Current Thread

[Next in Thread]

locale-dependent token separator handling doesn't work in multi-byte locales, Stephane Chazelas, 2014/10/08
- Re: locale-dependent token separator handling doesn't work in multi-byte locales, Eric Blake, 2014/10/08
  - Re: locale-dependent token separator handling doesn't work in multi-byte locales, Stephane Chazelas, 2014/10/08
  - Re: locale-dependent token separator handling doesn't work in multi-byte locales, Ángel González, 2014/10/08
    - Re: locale-dependent token separator handling doesn't work in multi-byte locales, Eric Blake <=
- Re: locale-dependent token separator handling doesn't work in multi-byte locales, Chet Ramey, 2014/10/08

Prev by Date: Re: Adding a new configure option to enable/disable getenv redefinition
Next by Date: Re: bash using unknown tmp - library based? static link on linux? (was Re: bash not using pipes or /tmp @ boot?)
Previous by thread: Re: locale-dependent token separator handling doesn't work in multi-byte locales
Next by thread: Re: locale-dependent token separator handling doesn't work in multi-byte locales
Index(es):
- Date
- Thread