[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: regex confusion -- not matching; think it should?
From: |
Greg Wooledge |
Subject: |
Re: regex confusion -- not matching; think it should? |
Date: |
Thu, 20 Jun 2013 08:09:38 -0400 |
User-agent: |
Mutt/1.4.2.3i |
On Wed, Jun 19, 2013 at 06:12:57PM -0500, Dan Douglas wrote:
> Thanks to mksh, posh, etc not supporting POSIX character classes at all, I'm
> not so sure it's actually better in practice. (talking about standard shell
> pattern matching of course)
I'm fairly sure nobody on the entire planet uses those shells except
their authors and you.
Now, since this is a bash mailing list, it's reasonable to talk about
bash. If you're writing a script in bash, you MUST NOT use the [a-z]
or [A-Z] ranges, or any other alphabetic ranges, unless you are
working in the POSIX locale. If you use an alphabetic range in any
other locale, you invite disaster.
Here is disaster:
imadev:~$ echo Hello World | tr A-Z a-z
hÉMMÓ wÓSMÐ
That is why you MUST NOT use alphabetic ranges in non-POSIX locales.
Here's how you SHOULD do it:
imadev:~$ echo Hello World | LANG=C tr A-Z a-z
hello world
imadev:~$ echo Hello World | tr '[:upper:]' '[:lower:]'
hello world
The latter is preferred if there is any chance you are working with
non-ASCII letters, as it will handle them:
imadev:~$ echo Ábc | tr '[:upper:]' '[:lower:]'
ábc
In the POSIX locale, Á isn't part of A-Z, so it is not matched:
imadev:~$ echo Ábc | LANG=C tr A-Z a-z
Ábc