[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: regex confusion -- not matching; think it should?
From: |
Dan Douglas |
Subject: |
Re: regex confusion -- not matching; think it should? |
Date: |
Fri, 21 Jun 2013 02:39:14 -0500 |
On Thu, Jun 20, 2013 at 7:09 AM, Greg Wooledge <wooledg@eeg.ccf.org> wrote:
> On Wed, Jun 19, 2013 at 06:12:57PM -0500, Dan Douglas wrote:
>> Thanks to mksh, posh, etc not supporting POSIX character classes at all, I'm
>> not so sure it's actually better in practice. (talking about standard shell
>> pattern matching of course)
>
> I'm fairly sure nobody on the entire planet uses those shells except
> their authors and you.
I'm talking about the entire family of pdksh-derived shells. mksh
ships with Android. oksh on openbsd. pdksh on SUA / interix. I'm sure
some use posh for testing. Collectively I'd say they're at least as
significant as dash, probably more.
> Now, since this is a bash mailing list, it's reasonable to talk about
> bash. If you're writing a script in bash, you MUST NOT use the [a-z]
> or [A-Z] ranges, or any other alphabetic ranges, unless you are
> working in the POSIX locale. If you use an alphabetic range in any
> other locale, you invite disaster.
I can't reproduce this on a GNU system using en_US.UTF-8
Are you saying this because certain implementations tend to behave
this way, or because it's implied by the spec? I'd assume this has
more to do with your C library than to do with Bash specifically.
According to POSIX the character ranges look just as bad as the
character classes. There's even text which says implementations may
offer extensions that do not even include those characters required
for the C locale, and I don't see anything that says what should occur
for non-POSIX locales.