[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Equivalence classes handled differently in mb vs non-mb patterns
From: |
Harald van Dijk |
Subject: |
Equivalence classes handled differently in mb vs non-mb patterns |
Date: |
Tue, 28 Jul 2020 09:17:57 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:79.0) Gecko/20100101 Thunderbird/79.0 |
Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: linux-gnux32
Compiler: gcc-10.1.0 -mx32
Compilation CFLAGS: -O2 -Wno-parentheses -Wno-format-security
uname output: Linux loucetios 5.7.9 #1 SMP @1590968955 x86_64 GNU/Linux
Machine Type: x86_64-pc-linux-gnux32
Bash Version: 5.0
Patch Level: 18
Release Status: release
Description:
In lib/glob/smatch.c, there are two functions that are used to
check equivalence classes in patterns: collequiv, and
collequiv_wc. The former is used if the pattern does not contain
any multi-byte characters, the latter otherwise, With
exceptions that are not relevant to this bug. The two functions
do not give the same results: collequiv does not implement the
fnmatch() fallback code that collequiv_wc does implement,
leading to inconsistent matching for ASCII-only equivalence
classes.
(This is not something I encountered in a real script. I am
implementing equivalence class support myself, using fnmatch()
as the main check rather than as a fallback, and comparing the
results to those of other shells.)
Repeat-By:
case a in [[=A=]]) echo match 1 ;; esac
case aá in [[=A=]]á) echo match 2 ;; esac
In locales where A and a are not in the same equivalence class,
this should print nothing. glibc's ja_JP.UTF-8 is such a locale.
The C locale is such a locale as well, but it does not allow
for the á character, so may be bad for testing.
In locales where A and a are in the same equivalence class, this
should print "match 1" and "match 2". glibc's en_US.UTF-8 is
such a locale.
What actually happens in glibc's en_US.UTF-8 locale is that only
"match 2" is printed.
Fix:
Copy the FNMATCH_EQUIV_FALLBACK logic from collequiv_wc to
collequiv. _fnmatch_fallback_wc may be copied to create a non-wc
version of it, but it also works to have collequiv call
_fnmatch_fallback_wc by converting characters to wide
characters.
- Equivalence classes handled differently in mb vs non-mb patterns,
Harald van Dijk <=