bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug #48055] Regex ranges and locales in gnu-awk regextype


From: James Youngman
Subject: Re: [bug #48055] Regex ranges and locales in gnu-awk regextype
Date: Sun, 27 Nov 2016 17:15:25 +0000

Findutils uses the regular _expression_ implementation from gnulib.  So this problem likely also exists there, or perhaps has already been fixed there.

On Mon, May 30, 2016 at 7:12 AM, Piotr Jurkiewicz <address@hidden> wrote:
URL:
  <http://savannah.gnu.org/bugs/?48055>

                 Summary: Regex ranges and locales in gnu-awk regextype
                 Project: findutils
            Submitted by: piotrjurkiewicz
            Submitted on: Mon 30 May 2016 08:12:40 AM CEST
                Category: find
                Severity: 3 - Normal
              Item Group: Wrong result
                  Status: None
                 Privacy: Public
             Assigned to: None
         Originator Name:
        Originator Email:
             Open/Closed: Open
         Discussion Lock: Any
                 Release: 4.6.0
           Fixed Release: None

    _______________________________________________________

Details:

Starting with gawk 4.0 the traditional behaviour of regex ranges has been
brought back. This means that [a-z] matches only lowercase letters and [A-Z]
matches only uppercase letters, regardless of locale and collation being set.

See more:
https://www.gnu.org/software/gawk/manual/html_node/Ranges-and-Locales.html

Can test this with the following command:

$ echo ABC | LC_COLLATE=pl_PL.utf8 gawk '$0 ~ /^[a-b]/' # gawk pre-4.0
ABC

$ echo ABC | LC_COLLATE=pl_PL.utf8 gawk '$0 ~ /^[a-b]/' # gawk 4.0+
[nothing]

Findutils, however, still emulate the old behaviour of gawk in gnu-awk mode.
That is, when using certain locales, [a-z] and [A-Z] ranges matches both
lowercase and uppercase letters.

Test:

Prepare:

mkdir test
cd test
touch a.lower
touch b.UPPER

Then both commands:

LC_COLLATE=pl_PL.utf8 find -regextype gnu-awk -regex '.*[a-z]{5}$'
LC_COLLATE=pl_PL.utf8 find -regextype gnu-awk -regex '.*[A-Z]{5}$'

returns:

./a.lower
./b.UPPER

instead just one file with appropriate case.




    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?48055>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/





--
--
This email is intended solely for the use of its addressee, sender, and any readers of a mailing list archive in which it happens to appear.   If you have received this email in error, please say or type three times, "I believe in the utility of email disclaimers," and then reply to the author correcting any spellings (and, optionally, any incorrect spellings), accompanying these with humorous jests about the author's parentage.   If you are not the addressee, you are nevertheless permitted to both copy and forward this email since without such permissions email systems are unable to transmit email to anybody, intended recipient or not.  To those still reading by this point, the author would like to apologise for being unable to maintain a consistent level of humour throughout this disclaimer.  Contents may settle during transit.  Do not feed the animals.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]