[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

regexps and locales

From: Olivier Wittenberg
Subject: regexps and locales
Date: Wed, 26 Jan 2005 22:21:43 +0100
User-agent: Mutt/


I have noticed the following behaviour in bash; it seems to me that it
is a bug, but I'm not 100% sure.

Configuration Information:
Machine: i386
OS: linux-gnu
Compiler: i386-redhat-linux-gcc
Compilation CFLAGS:  -DPROGRAM='bash' -DCONF_HOSTTYPE='i386'
-DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='i386-redhat-linux-gnu'
-DCONF_VENDOR='redhat' -DLOCALEDIR='/usr/share/locale'
-DPACKAGE='bash' -DSHELL -DHAVE_CONFIG_H  -I.  -I. -I./include -I./lib
-D_FILE_OFFSET_BITS=64 -O2 -g -pipe -m32 -march=i386 -mtune=pentium4
uname output: Linux foobar 2.6.10-1.741_FC3smp #1 SMP Thu Jan 13
16:53:16 EST 2005 i686 i686 i386 GNU/Linux
Machine Type: i386-redhat-linux-gnu

Bash Version: 3.0
Patch Level: 14
Release Status: release


When bash does pathname expansion, it finds that the pattern * matches
any file, even if POSIXLY_CORRECT is set.  I think the POSIX
specification says it should only match those files whose names are
valid sequences of characters according to the current locale.  For
instance, when using a UTF-8 locale, it should not match the file
whose name is the output of "echo -e \\351".

Note that GNU find version 4.1.20 works as expected:
  find -name \*
does not print, e.g., the non-UTF-8 filenames when using a UTF-8

The POSIX-compliant behaviour may well be deemed braindead, but that's
unfortunately another story...

  LC_ALL=en_US.UTF-8 bash
  mkdir /tmp/foo ; cd /tmp/foo ; touch dummy
  nargs() { echo $#; }
  nargs *
  touch $(echo -e \\351)
  nargs *

It prints 1 and then 2, it should print 1 and then 1.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]