|
From: | Eric Blake |
Subject: | Re: built-in regex matches wrong character |
Date: | Thu, 6 Sep 2018 09:23:33 -0500 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 |
On 09/06/2018 09:17 AM, Chet Ramey wrote:
On 9/5/18 4:39 PM, Eric Blake wrote:Or, you can use bash's 'shopt -s globasciiranges' which is supposed to enable Rational Range Interpretation, where even in non-C locales, a character range bounded by two ASCII characters takes on the C locale definition of only the ASCII characters in that range, rather than the locale's definition of whatever other characters might also be equivalent (actually, while I know that shopt affects globbing, I don't know if it also affects regex matching - but if it doesn't, that's probably a bug that should be fixed).Since bash uses the C library's regexp engine, and most C libraries don't implement RRI, much less expose it as a flags option available via regcomp(), there's no reason to expect that globasciiranges would have any effect on regular expression matching.
But bash could be taught to convert any regex that contains a range with both endpoints ASCII into a different bracket expression before handing things over to regcomp(). That is, if the user is matching against [a-d], bash hands [abcd] to regcomp() instead. You don't need a flag in regcomp() to get RRI, just merely some pre-processing (and often memory allocation, as the expansion of a range into a non-range tends to require more characters).
-- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
[Prev in Thread] | Current Thread | [Next in Thread] |