bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Equivalence classes handled differently in mb vs non-mb patterns


From: Harald van Dijk
Subject: Equivalence classes handled differently in mb vs non-mb patterns
Date: Tue, 28 Jul 2020 09:17:57 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:79.0) Gecko/20100101 Thunderbird/79.0

Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: linux-gnux32
Compiler: gcc-10.1.0 -mx32
Compilation CFLAGS: -O2 -Wno-parentheses -Wno-format-security
uname output: Linux loucetios 5.7.9 #1 SMP @1590968955 x86_64 GNU/Linux
Machine Type: x86_64-pc-linux-gnux32

Bash Version: 5.0
Patch Level: 18
Release Status: release

Description:
        In lib/glob/smatch.c, there are two functions that are used to
        check equivalence classes in patterns: collequiv, and
        collequiv_wc. The former is used if the pattern does not contain
        any multi-byte characters, the latter otherwise, With
        exceptions that are not relevant to this bug. The two functions
        do not give the same results: collequiv does not implement the
        fnmatch() fallback code that collequiv_wc does implement,
        leading to inconsistent matching for ASCII-only equivalence
        classes.

        (This is not something I encountered in a real script. I am
        implementing equivalence class support myself, using fnmatch()
        as the main check rather than as a fallback, and comparing the
        results to those of other shells.)

Repeat-By:
        case a  in [[=A=]])  echo match 1 ;; esac
        case aá in [[=A=]]á) echo match 2 ;; esac

        In locales where A and a are not in the same equivalence class,
        this should print nothing. glibc's ja_JP.UTF-8 is such a locale.
        The C locale is such a locale as well, but it does not allow
        for the á character, so may be bad for testing.

        In locales where A and a are in the same equivalence class, this
        should print "match 1" and "match 2". glibc's en_US.UTF-8 is
        such a locale.

        What actually happens in glibc's en_US.UTF-8 locale is that only
        "match 2" is printed.

Fix:
        Copy the FNMATCH_EQUIV_FALLBACK logic from collequiv_wc to
        collequiv. _fnmatch_fallback_wc may be copied to create a non-wc
        version of it, but it also works to have collequiv call
        _fnmatch_fallback_wc by converting characters to wide
        characters.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]