[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: bug on [A-Z] and [a-z]
From: |
DJ Mills |
Subject: |
Re: bug on [A-Z] and [a-z] |
Date: |
Mon, 2 May 2011 12:22:43 -0400 |
On Mon, May 2, 2011 at 9:23 AM, Bruno Cesar Ribas <ribas@inf.ufpr.br> wrote:
> On Mon, May 02, 2011 at 08:41:23AM -0400, Greg Wooledge wrote:
> > On Sun, May 01, 2011 at 09:17:49PM -0500, Jonathan Nieder wrote:
> > > Hi,
> > >
> > > ribas@inf.ufpr.br wrote:
> > >
> > > > When running "echo [A-Z]*" , it shows all files/dirs of current
> > > > directory, not only those starting with capital letters. I tried
> > > > different locales such as: POSIX, C, en_US, pt_BR
> > > >
> > > > Repeat-By:
> > > > $ mkdir a && cd a
> > > > $ touch a b c; mkdir D E F
> > > > $ echo [A-Z]*
> > > > b c D E F
> > > > $ echo [a-z]*
> > > > a b c D E F
> > >
> > > See http://bugs.debian.org/301717 (???fnmatch("[a-z]", ...) matches
> > > capital letters in most locales???) for some details.
> >
> > See also http://mywiki.wooledge.org/locale
>
> Thanks for the explanations now I understand what is happening.
>
> >
> > > I'm puzzled by your comment on trying different locales, though:
> > > I tried
> > >
> > > mkdir a && cd a
> > > touch a b c; mkdir D E F
> > > echo [A-Z]*
> > >
> > > and got output
> > >
> > > b c D E F
> > >
> > > as expected. Then I tried
> > >
> > > LANG=C
> > > export LANG
> > > echo [A-Z]*
> > >
> > > and got output
> > >
> > > D E F
> > >
> > > Does your experience differ? I'm using 4.1.5(1)-release fwiw.
> >
> > Presumably, "ribas" did not correctly set the locale variables during
> > his or her testing.
>
> Indeed, I did not export the variable just ran like LANG=C echo [A-Z]*,
> exporting works.
>
> >
> > > > No Fix yet, looking on the source code.
> >
> > There's nothing to fix. This is in the realm of a new feature request.
> >
> > > In the long run, a good fix might be to teach fnmatch a new
> > > FNM_STRICTCASE flag and optionally use it.
> >
> > If by "strict case" you mean "force POSIX locale" or "force US-ASCII
> > ordering", then the option ought to be called something less confusing.
> >
> > > The hardest part would
> > > seem to be making tables so the system can know what "this range,
> > > using the same case" means.
> >
> > It already knows this, because it's what the POSIX (C) locale does.
> >
> > > A separate aspect is documentation. I imagine Chet wouldn't mind
> > > a patch to bash.1 and bash.info to explain this pitfall under
> > > "Pattern Matching" or even under "BUGS" (aka LIMITATIONS).
> >
> > This is not a bug, so it does not belong in BUGS.
> >
> > The first place I found in the man page that makes mention of this is
> > the Pathname Expansion section. This, I agree, should be changed.
> > Perhaps this would an acceptable wording:
> >
> >
> > --- doc/bash.1.orig Mon May 2 08:31:26 2011
> > +++ doc/bash.1 Mon May 2 08:35:51 2011
> > @@ -3121,8 +3121,8 @@
> > If one of these characters appears, then the word is
> > regarded as a
> > .IR pattern ,
> > -and replaced with an alphabetically sorted list of
> > -file names matching the pattern.
> > +and replaced with a list of file names matching the pattern,
> > +sorted alphabetically by the current locale's collating sequence.
> > If no matching file names are found,
> > and the shell option
> > .B nullglob
> >
> >
> > Under Pattern Matching, there is already an explanation of how it uses
> the
> > LC_COLLATE variable, the current locale, etc. It's all there. In fact,
> > since Pattern Matching is a subsection of Pathname Expansion, one could
> > argue that my patch is redundant, but since the pathname expansion stuff
> > appears first, someone may stop reading before encountering the more
> > verbose description, so IMHO it doesn't hurt to correct the introduction.
>
> --
> Bruno Ribas - ribas@inf.ufpr.br
> http://www.inf.ufpr.br/ribas
>
>
Alternatively, just use [[:upper:]], [[:lower:]], etc. They are considered
locale-safe.