[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: built-in regex matches wrong character

From: Eric Blake
Subject: Re: built-in regex matches wrong character
Date: Wed, 5 Sep 2018 15:39:01 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1

On 09/05/2018 01:50 PM, address@hidden wrote:

        It seems like bash built-in regex matches some symbols that shouldn't. 
The following commands shows this:
                [[ 'º' =~ [o-p] ]] && [[ ! 'º' =~ o ]] && [[ ! 'º' =~ p ]] && 
echo 'º between o and p but none of them'
                [[ 'ª' =~ [a-b] ]] && [[ ! 'ª' =~ a ]] && [[ ! 'ª' =~ b ]] && 
echo 'ª between a and b but none of them'

        Actually found out this while developing a bigger bash script, but it 
can be reproduced with the previous lines. Would you reply me at address@hidden 
to know if this was in fact a bug? Thanks.

Not a bug, but a property of your locale.

POSIX says that range expressions in regular expressions are implementation-defined except for in the C locale, which means [a-b] is free to match more than just the two ASCII characters 'a' and 'b', but rather anything that your current locale considers equivalent.

If you run your script with LC_ALL=C in the environment, you won't have that problem (because there, [a-b] is well-defined to be exactly two characters). Or, you can use bash's 'shopt -s globasciiranges' which is supposed to enable Rational Range Interpretation, where even in non-C locales, a character range bounded by two ASCII characters takes on the C locale definition of only the ASCII characters in that range, rather than the locale's definition of whatever other characters might also be equivalent (actually, while I know that shopt affects globbing, I don't know if it also affects regex matching - but if it doesn't, that's probably a bug that should be fixed).

Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

reply via email to

[Prev in Thread] Current Thread [Next in Thread]