[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
regexps and locales
From: |
Olivier Wittenberg |
Subject: |
regexps and locales |
Date: |
Wed, 26 Jan 2005 22:21:43 +0100 |
User-agent: |
Mutt/1.4.2.1i |
Hello,
I have noticed the following behaviour in bash; it seems to me that it
is a bug, but I'm not 100% sure.
Configuration Information:
Machine: i386
OS: linux-gnu
Compiler: i386-redhat-linux-gcc
Compilation CFLAGS: -DPROGRAM='bash' -DCONF_HOSTTYPE='i386'
-DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='i386-redhat-linux-gnu'
-DCONF_VENDOR='redhat' -DLOCALEDIR='/usr/share/locale'
-DPACKAGE='bash' -DSHELL -DHAVE_CONFIG_H -I. -I. -I./include -I./lib
-D_FILE_OFFSET_BITS=64 -O2 -g -pipe -m32 -march=i386 -mtune=pentium4
uname output: Linux foobar 2.6.10-1.741_FC3smp #1 SMP Thu Jan 13
16:53:16 EST 2005 i686 i686 i386 GNU/Linux
Machine Type: i386-redhat-linux-gnu
Bash Version: 3.0
Patch Level: 14
Release Status: release
Description:
When bash does pathname expansion, it finds that the pattern * matches
any file, even if POSIXLY_CORRECT is set. I think the POSIX
specification says it should only match those files whose names are
valid sequences of characters according to the current locale. For
instance, when using a UTF-8 locale, it should not match the file
whose name is the output of "echo -e \\351".
Note that GNU find version 4.1.20 works as expected:
find -name \*
does not print, e.g., the non-UTF-8 filenames when using a UTF-8
locale.
The POSIX-compliant behaviour may well be deemed braindead, but that's
unfortunately another story...
Repeat-By:
LC_ALL=en_US.UTF-8 bash
mkdir /tmp/foo ; cd /tmp/foo ; touch dummy
nargs() { echo $#; }
nargs *
touch $(echo -e \\351)
nargs *
It prints 1 and then 2, it should print 1 and then 1.
Best,
--Olivier
- regexps and locales,
Olivier Wittenberg <=