bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Wildcard expansion can fail with nonprinting characters


From: Geoff Kuenning
Subject: Re: Wildcard expansion can fail with nonprinting characters
Date: Mon, 30 Sep 2019 17:39:18 -0700
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux)

$'\361' is a valid character in Latin-1, which is how it happened to arise in my case. Also, I tested with the C locale, which should be agnostic to character encodings, and got the same result.

The general Unix philosophy, which in this case says "I'm not going to pass judgment on the weird things you do even though I don't understand them", argues for being able to handle any arbitrary sequence of bytes, at least on Linux. That's one of the things that makes the Unix paradigm so powerful. So I appreciate your willingness to fix this.

On 9/27/19 7:52 PM, Geoff Kuenning wrote:
Version:

GNU bash, version 4.4.23(1)-release (x86_64-suse-linux-gnu)
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

Behavior:

If a pathname contains nonprinting characters, and is expanded from a
variable name, wildcard expansion can sometimes fail.

This is an interesting report. The $'\361' is a unicode combining character, which ends up making the entire sequence of characters an
invalid wide character string in a bunch of different locales.

Some file systems (Mac OS X APFS) don't allow you to create files with invalid characters or character sequences in their names. Others (Linux)
don't have a problem with it.

The code to dequote filenames that's needed for "$x" tries to fall back to single-byte character operations in the presence of invalid character or byte sequences, but that means you can't use any of the standard wide character functions to check for valid and invalid wide character strings.

The change between bash-4.4 and bash-5.0 is that the globbing code doesn't bother to try and convert to wide characters to do the dequoting if there aren't any valid multibyte characters in the pathname, but uses the single byte character code instead. That works for this case, but doesn't work for pathnames that have both valid and invalid wide character sequences.

A better fix is to write a symmetric function that will take the output of xdupmbstowcs2 (bash's replacement for mbstowcs that handles zero-length wide character strings that aren't null wide characters) and handle the invalid wide character strings that may result from it. I'll make that fix
for the next release.

Chet

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU chet@case.edu http://tiswww.cwru.edu/~chet/


--
Geoff Kuenning geoff@cs.hmc.edu http://www.cs.hmc.edu/~geoff/

Orchestra retrospectively extremely satisfied with symphony [No. 1] as
result of barrel of free beer.
        -- Gustav Mahler, post-premiere letter to Arnold Berliner



reply via email to

[Prev in Thread] Current Thread [Next in Thread]