bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: regex confusion -- not matching; think it should?


From: Greg Wooledge
Subject: Re: regex confusion -- not matching; think it should?
Date: Thu, 20 Jun 2013 08:09:38 -0400
User-agent: Mutt/1.4.2.3i

On Wed, Jun 19, 2013 at 06:12:57PM -0500, Dan Douglas wrote:
> Thanks to mksh, posh, etc not supporting POSIX character classes at all, I'm
> not so sure it's actually better in practice. (talking about standard shell
> pattern matching of course)

I'm fairly sure nobody on the entire planet uses those shells except
their authors and you.

Now, since this is a bash mailing list, it's reasonable to talk about
bash.  If you're writing a script in bash, you MUST NOT use the [a-z]
or [A-Z] ranges, or any other alphabetic ranges, unless you are
working in the POSIX locale.  If you use an alphabetic range in any
other locale, you invite disaster.

Here is disaster:

imadev:~$ echo Hello World | tr A-Z a-z
hÉMMÓ wÓSMÐ

That is why you MUST NOT use alphabetic ranges in non-POSIX locales.
Here's how you SHOULD do it:

imadev:~$ echo Hello World | LANG=C tr A-Z a-z
hello world
imadev:~$ echo Hello World | tr '[:upper:]' '[:lower:]'
hello world

The latter is preferred if there is any chance you are working with
non-ASCII letters, as it will handle them:

imadev:~$ echo Ábc | tr '[:upper:]' '[:lower:]'
ábc

In the POSIX locale, Á isn't part of A-Z, so it is not matched:

imadev:~$ echo Ábc | LANG=C tr A-Z a-z
Ábc



reply via email to

[Prev in Thread] Current Thread [Next in Thread]