bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: built-in regex matches wrong character


From: Miguel Amat
Subject: Re: built-in regex matches wrong character
Date: Thu, 6 Sep 2018 00:48:35 +0200

Thanks for your response Eric, please find my attached screenshot
testing both solutions. Seems like setting LC_ALL=C in the environment
works fine while 'shopt -s globasciiranges' does not (also I could be
testing this the wrong way, first time using shopt).

Regards,
Miguel

On 9/5/18, Eric Blake <eblake@redhat.com> wrote:
> On 09/05/2018 01:50 PM, mamatb@mamatb-laptop wrote:
>
>> Description:
>>      It seems like bash built-in regex matches some symbols that shouldn't.
>> The following commands shows this:
>>              [[ 'º' =~ [o-p] ]] && [[ ! 'º' =~ o ]] && [[ ! 'º' =~ p ]] && 
>> echo 'º
>> between o and p but none of them'
>>              [[ 'ª' =~ [a-b] ]] && [[ ! 'ª' =~ a ]] && [[ ! 'ª' =~ b ]] && 
>> echo 'ª
>> between a and b but none of them'
>>
>> Repeat-By:
>>      Actually found out this while developing a bigger bash script, but it 
>> can
>> be reproduced with the previous lines. Would you reply me at
>> amatbaeza@gmail.com to know if this was in fact a bug? Thanks.
>
> Not a bug, but a property of your locale.
>
> POSIX says that range expressions in regular expressions are
> implementation-defined except for in the C locale, which means [a-b] is
> free to match more than just the two ASCII characters 'a' and 'b', but
> rather anything that your current locale considers equivalent.
>
> If you run your script with LC_ALL=C in the environment, you won't have
> that problem (because there, [a-b] is well-defined to be exactly two
> characters).  Or, you can use bash's 'shopt -s globasciiranges' which is
> supposed to enable Rational Range Interpretation, where even in non-C
> locales, a character range bounded by two ASCII characters takes on the
> C locale definition of only the ASCII characters in that range, rather
> than the locale's definition of whatever other characters might also be
> equivalent (actually, while I know that shopt affects globbing, I don't
> know if it also affects regex matching - but if it doesn't, that's
> probably a bug that should be fixed).
>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.           +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org
>

Attachment: bash_bug.png
Description: PNG image


reply via email to

[Prev in Thread] Current Thread [Next in Thread]