bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bug on [A-Z] and [a-z]


From: Greg Wooledge
Subject: Re: bug on [A-Z] and [a-z]
Date: Mon, 2 May 2011 08:41:23 -0400
User-agent: Mutt/1.4.2.3i

On Sun, May 01, 2011 at 09:17:49PM -0500, Jonathan Nieder wrote:
> Hi,
> 
> ribas@inf.ufpr.br wrote:
> 
> >     When  running "echo [A-Z]*" , it shows all files/dirs of current
> >     directory, not only those starting with capital letters. I tried
> >     different locales such as: POSIX, C, en_US, pt_BR
> >
> > Repeat-By:
> >     $ mkdir a && cd a
> >     $ touch a b c; mkdir D E F
> >     $ echo [A-Z]*
> >     b c D E F
> >     $ echo [a-z]*
> >     a b c D E F
> 
> See http://bugs.debian.org/301717 (???fnmatch("[a-z]", ...) matches
> capital letters in most locales???) for some details.

See also http://mywiki.wooledge.org/locale

> I'm puzzled by your comment on trying different locales, though:
> I tried
> 
>       mkdir a && cd a
>       touch a b c; mkdir D E F
>       echo [A-Z]*
> 
> and got output
> 
>       b c D E F
> 
> as expected.  Then I tried
> 
>       LANG=C
>       export LANG
>       echo [A-Z]*
> 
> and got output
> 
>       D E F
> 
> Does your experience differ?  I'm using 4.1.5(1)-release fwiw.

Presumably, "ribas" did not correctly set the locale variables during
his or her testing.

> >     No Fix yet, looking on the source code.

There's nothing to fix.  This is in the realm of a new feature request.

> In the long run, a good fix might be to teach fnmatch a new
> FNM_STRICTCASE flag and optionally use it.

If by "strict case" you mean "force POSIX locale" or "force US-ASCII
ordering", then the option ought to be called something less confusing.

> The hardest part would
> seem to be making tables so the system can know what "this range,
> using the same case" means.

It already knows this, because it's what the POSIX (C) locale does.

> A separate aspect is documentation.  I imagine Chet wouldn't mind
> a patch to bash.1 and bash.info to explain this pitfall under
> "Pattern Matching" or even under "BUGS" (aka LIMITATIONS).

This is not a bug, so it does not belong in BUGS.

The first place I found in the man page that makes mention of this is
the Pathname Expansion section.  This, I agree, should be changed.
Perhaps this would an acceptable wording:


--- doc/bash.1.orig     Mon May  2 08:31:26 2011
+++ doc/bash.1  Mon May  2 08:35:51 2011
@@ -3121,8 +3121,8 @@
 If one of these characters appears, then the word is
 regarded as a
 .IR pattern ,
-and replaced with an alphabetically sorted list of
-file names matching the pattern.
+and replaced with a list of file names matching the pattern,
+sorted alphabetically by the current locale's collating sequence.
 If no matching file names are found,
 and the shell option
 .B nullglob


Under Pattern Matching, there is already an explanation of how it uses the
LC_COLLATE variable, the current locale, etc.  It's all there.  In fact,
since Pattern Matching is a subsection of Pathname Expansion, one could
argue that my patch is redundant, but since the pathname expansion stuff
appears first, someone may stop reading before encountering the more
verbose description, so IMHO it doesn't hurt to correct the introduction.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]