emacs-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#43577: closed (wrong result for grep -io in turkish locale)


From: GNU bug Tracking System
Subject: bug#43577: closed (wrong result for grep -io in turkish locale)
Date: Thu, 24 Sep 2020 02:58:01 +0000

Your message dated Wed, 23 Sep 2020 19:57:36 -0700
with message-id <566c67b3-062e-d648-2dff-15f8c4b08e36@cs.ucla.edu>
and subject line Re: bug#43577: wrong result for grep -io in turkish locale
has caused the debbugs.gnu.org bug report #43577,
regarding wrong result for grep -io in turkish locale
to be marked as done.

(If you believe you have received this mail in error, please contact
help-debbugs@gnu.org.)


-- 
43577: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=43577
GNU Bug Tracking System
Contact help-debbugs@gnu.org with problems
--- Begin Message --- Subject: wrong result for grep -io in turkish locale Date: Wed, 23 Sep 2020 22:23:09 +0900
In turkish locale, upper and lower case are mapped as following.

  U0049 <-> U0131
  U0069 <-> U0130

It's expected that both following test cases returns U0130, but later
returns nothing.

$ printf '\304\260\n' >I  # U0130
$ env LC_ALL=tr_TR.utf8 grep -i i I
?  # U0130
$ env LC_ALL=tr_TR.utf8 grep -oi i I
$ 

By the way, both following test cases work correctly.

$ printf '\304\260\n' >i  # U0131
$ env LC_ALL=tr_TR.utf8 grep -i I i
?  # U0131
$ env LC_ALL=tr_TR.utf8 grep -oi I i
?  # U0131
$




--- End Message ---
--- Begin Message --- Subject: Re: bug#43577: wrong result for grep -io in turkish locale Date: Wed, 23 Sep 2020 19:57:36 -0700 User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0
On 9/23/20 6:47 PM, Norihiro Tanaka wrote:
I attach the fix for the bug.  Regex is fixed in Paul, thank you.


Thanks, I had written a similar patch, and your patch helped me find a bug in what I wrote. The patch I wrote uses an auxiliary ok_fold table that lets fgrep_icase_charlen avoid calling mbrtwoc for single-byte characters in the pattern; this may help performance for long patterns. More important, fgrep_icase_charlen does not return -1 for a character like 'a' in an en_US.UTF-8 locale merely because 'a' has a case folded counterpart 'A'; the idea is that we should be OK if the case folded counterparts are single-byte.

I had added more-extensive tests than were in your patch, and some of them found a crash in kwsinit that indicated a similar change is needed there. I assume this was because the patch I wrote had a more-generous fgrep_icase_charlen. As this simplifies kwsinit, this patch does that too.

While looking into this I found a performance glitch I recently introduced (I double-counted some regular expressions, messing up later heuristics). Plus I checked on this on our old Solaris 10 box and fixed a couple of porting glitches. I installed the attached patches, into the master branch, to help make it easier for you to compare your changes to mine. Patch 0003 is the enhanced version of the patch that you wrote.

Thanks again for working on this.

Attachment: 0001-grep-fix-recently-introduced-performance-glitch.patch
Description: Text Data

Attachment: 0002-build-update-gnulib-submodule-to-latest.patch
Description: Text Data

Attachment: 0003-grep-fix-more-Turkish-eyes-bugs.patch
Description: Text Data

Attachment: 0004-grep-pacify-Sun-C-5.15.patch
Description: Text Data

Attachment: 0005-grep-don-t-assume-PCRE-in-tests.patch
Description: Text Data


--- End Message ---

reply via email to

[Prev in Thread] Current Thread [Next in Thread]