[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

locale-dependent token separator handling doesn't work in multi-byte loc

From: Stephane Chazelas
Subject: locale-dependent token separator handling doesn't work in multi-byte locales
Date: Wed, 8 Oct 2014 15:52:24 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

When bash parses code it honours the "blank" character class in
the current locale as token separator.

For instance, if "x" is a blank character in the current locale,


would output bar. "yash" is the only other shell that I know
that does the same.

With bash, that only works in single-byte locales though.
Probably because bash does some isblank() on individual bytes
instead of characters.

I would also question the usefulness of such a feature.

That's what aggravated CVE-2014-0475 (a glibc vulnerability). By
creating a locale where every character except "s" "h" and a few
others were blanks, one could do LC_ALL=../../my/evil/locale ssh
address@hidden and interpreting the /etc/bash.bashrc as
shipped with some GNU/Linux distributions was enough to get a
shell on the git server (provided you were able to upload the
locale to the server).

That also means that the script syntax also depends on the

>From a review of the available locales on my GNU system, I
couldn't find a single locale where "blank" is anything but
space and tab.

The only locales where more blank characters are defined are the
multi-byte ones.

So removing that feature would not break anything.

There's a similar issue in what is allowed in variable names

$ address@hidden bash -c $'declare St\xe9phane=1'
$ LC_ALL=fr_FR.UTF-8   bash -c $'declare St\u00e9phane=1'
bash: ligne 0 : declare: « Stéphane=1 » : identifiant non valable

Here, removing the feature might break scripts written for
single-byte non-ASCII locales, but given that most of the world
is switching to UTF-8, it seems unlikely as we'd have seen
reports of the problem before.

yash, zsh and ksh93 do support

LC_ALL=fr_FR.UTF-8  zsh -c $'St\u00e9phane=1'

in multi-byte locale. but again, I'd say it's not necessarily useful.

It's nice to be able to use my first name as a variable name,
but make the parsing of a script depending on its *user*'s (as
opposed to *author*'s) locale is not ideal.

Maybe there's a better way to address that.

In any case, I think the feature should either be fixed (make it
also work in multi-byte locales), or the limitation (that it
only works in single-byte locales) documented, or the feature
(make the parsing locale-dependant) removed.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]