locale-dependent token separator handling doesn't work in multi-byte loc

bug-bash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

locale-dependent token separator handling doesn't work in multi-byte loc

From:	Stephane Chazelas
Subject:	locale-dependent token separator handling doesn't work in multi-byte locales
Date:	Wed, 8 Oct 2014 15:52:24 +0100
User-agent:	Mutt/1.5.21 (2010-09-15)

When bash parses code it honours the "blank" character class in
the current locale as token separator.

For instance, if "x" is a blank character in the current locale,

echoxbar

would output bar. "yash" is the only other shell that I know
that does the same.

With bash, that only works in single-byte locales though.
Probably because bash does some isblank() on individual bytes
instead of characters.

I would also question the usefulness of such a feature.

That's what aggravated CVE-2014-0475 (a glibc vulnerability). By
creating a locale where every character except "s" "h" and a few
others were blanks, one could do LC_ALL=../../my/evil/locale ssh
git@git.server... and interpreting the /etc/bash.bashrc as
shipped with some GNU/Linux distributions was enough to get a
shell on the git server (provided you were able to upload the
locale to the server).

That also means that the script syntax also depends on the
locale.

>From a review of the available locales on my GNU system, I
couldn't find a single locale where "blank" is anything but
space and tab.

The only locales where more blank characters are defined are the
multi-byte ones.

So removing that feature would not break anything.

There's a similar issue in what is allowed in variable names

$ LC_ALL=fr_FR.iso885915@euro bash -c $'declare St\xe9phane=1'
$ LC_ALL=fr_FR.UTF-8   bash -c $'declare St\u00e9phane=1'
bash: ligne 0 : declare: « Stéphane=1 » : identifiant non valable

Here, removing the feature might break scripts written for
single-byte non-ASCII locales, but given that most of the world
is switching to UTF-8, it seems unlikely as we'd have seen
reports of the problem before.

yash, zsh and ksh93 do support

LC_ALL=fr_FR.UTF-8  zsh -c $'St\u00e9phane=1'

in multi-byte locale. but again, I'd say it's not necessarily useful.

It's nice to be able to use my first name as a variable name,
but make the parsing of a script depending on its *user*'s (as
opposed to *author*'s) locale is not ideal.

Maybe there's a better way to address that.

In any case, I think the feature should either be fixed (make it
also work in multi-byte locales), or the limitation (that it
only works in single-byte locales) documented, or the feature
(make the parsing locale-dependant) removed.

-- 
Stephane

[Prev in Thread]

Current Thread

[Next in Thread]

locale-dependent token separator handling doesn't work in multi-byte locales, Stephane Chazelas <=
- Re: locale-dependent token separator handling doesn't work in multi-byte locales, Eric Blake, 2014/10/08
  - Re: locale-dependent token separator handling doesn't work in multi-byte locales, Stephane Chazelas, 2014/10/08
  - Re: locale-dependent token separator handling doesn't work in multi-byte locales, Ángel González, 2014/10/08
    - Re: locale-dependent token separator handling doesn't work in multi-byte locales, Eric Blake, 2014/10/08
- Re: locale-dependent token separator handling doesn't work in multi-byte locales, Chet Ramey, 2014/10/08

Prev by Date: Re: bash using unknown tmp - library based? static link on linux? (was Re: bash not using pipes or /tmp @ boot?)
Next by Date: Re: locale-dependent token separator handling doesn't work in multi-byte locales
Previous by thread: "export -p" output not suitable for input when env vars contain non-identifiers
Next by thread: Re: locale-dependent token separator handling doesn't work in multi-byte locales
Index(es):
- Date
- Thread