[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: built-in regex matches wrong character

From: Eric Blake
Subject: Re: built-in regex matches wrong character
Date: Thu, 6 Sep 2018 09:23:33 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1

On 09/06/2018 09:17 AM, Chet Ramey wrote:
On 9/5/18 4:39 PM, Eric Blake wrote:

Or, you can use bash's 'shopt -s globasciiranges' which is
supposed to enable Rational Range Interpretation, where even in non-C
locales, a character range bounded by two ASCII characters takes on the C
locale definition of only the ASCII characters in that range, rather than
the locale's definition of whatever other characters might also be
equivalent (actually, while I know that shopt affects globbing, I don't
know if it also affects regex matching - but if it doesn't, that's probably
a bug that should be fixed).

Since bash uses the C library's regexp engine, and most C libraries don't
implement RRI, much less expose it as a flags option available via
regcomp(), there's no reason to expect that globasciiranges would have
any effect on regular expression matching.

But bash could be taught to convert any regex that contains a range with both endpoints ASCII into a different bracket expression before handing things over to regcomp(). That is, if the user is matching against [a-d], bash hands [abcd] to regcomp() instead. You don't need a flag in regcomp() to get RRI, just merely some pre-processing (and often memory allocation, as the expansion of a range into a non-range tends to require more characters).

Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

reply via email to

[Prev in Thread] Current Thread [Next in Thread]