[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: built-in regex matches wrong character
From: |
Miguel Amat |
Subject: |
Re: built-in regex matches wrong character |
Date: |
Thu, 6 Sep 2018 00:48:35 +0200 |
Thanks for your response Eric, please find my attached screenshot
testing both solutions. Seems like setting LC_ALL=C in the environment
works fine while 'shopt -s globasciiranges' does not (also I could be
testing this the wrong way, first time using shopt).
Regards,
Miguel
On 9/5/18, Eric Blake <eblake@redhat.com> wrote:
> On 09/05/2018 01:50 PM, mamatb@mamatb-laptop wrote:
>
>> Description:
>> It seems like bash built-in regex matches some symbols that shouldn't.
>> The following commands shows this:
>> [[ 'º' =~ [o-p] ]] && [[ ! 'º' =~ o ]] && [[ ! 'º' =~ p ]] &&
>> echo 'º
>> between o and p but none of them'
>> [[ 'ª' =~ [a-b] ]] && [[ ! 'ª' =~ a ]] && [[ ! 'ª' =~ b ]] &&
>> echo 'ª
>> between a and b but none of them'
>>
>> Repeat-By:
>> Actually found out this while developing a bigger bash script, but it
>> can
>> be reproduced with the previous lines. Would you reply me at
>> amatbaeza@gmail.com to know if this was in fact a bug? Thanks.
>
> Not a bug, but a property of your locale.
>
> POSIX says that range expressions in regular expressions are
> implementation-defined except for in the C locale, which means [a-b] is
> free to match more than just the two ASCII characters 'a' and 'b', but
> rather anything that your current locale considers equivalent.
>
> If you run your script with LC_ALL=C in the environment, you won't have
> that problem (because there, [a-b] is well-defined to be exactly two
> characters). Or, you can use bash's 'shopt -s globasciiranges' which is
> supposed to enable Rational Range Interpretation, where even in non-C
> locales, a character range bounded by two ASCII characters takes on the
> C locale definition of only the ASCII characters in that range, rather
> than the locale's definition of whatever other characters might also be
> equivalent (actually, while I know that shopt affects globbing, I don't
> know if it also affects regex matching - but if it doesn't, that's
> probably a bug that should be fixed).
>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc. +1-919-301-3266
> Virtualization: qemu.org | libvirt.org
>
bash_bug.png
Description: PNG image
Re: built-in regex matches wrong character, Chet Ramey, 2018/09/06