[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [1003.1(2016/18)/Issue7+TC2 0001558]: require [^...] in addition to

From: Zack Weinberg
Subject: Re: [1003.1(2016/18)/Issue7+TC2 0001558]: require [^...] in addition to [!...] for bracket expression negation
Date: Fri, 18 Feb 2022 10:10:21 -0500
User-agent: Cyrus-JMAP/3.5.0-alpha0-4778-g14fba9972e-fm-20220217.001-g14fba997

On Fri, Feb 18, 2022, at 9:38 AM, Eric Blake wrote:
>> typeset as_tr_cpp='eval sed
>> '\''y%*abcdefghijklmnopqrstuvwxyz%PABCDEFGHIJKLMNOPQRSTUVWXYZ%;s%[^_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789]%_%g'\'
>> (via “typeset -p”), and without -o noglob, using it as simply
>> $as_tr_cpp does, in fact, glob on it.
>> Yes, clearly a bug in GNU autoconf… which I’m not personally going to
>> even try and report. The …[^… is passed to sed. But it is also
>> processed by the shell first, by accident. (This is from
>> OpenSSH-portable’s configure.)
> So we need to patch autoconf to properly shell-quote the sed script
> stored in as_tr_cpp.

(This also affects as_tr_sh.)

This is going to take some surgery.  In a generated autoconf script we will have

# Sed expression to map a string onto a valid CPP name.
as_tr_cpp="eval sed 'y%*$as_cr_letters%P$as_cr_LETTERS%;s%[^_$as_cr_alnum]%_%g'"

# Sed expression to map a string onto a valid variable name.
as_tr_sh="eval sed 'y%*+%pp%;s%[^_$as_cr_alnum]%_%g'"

The RHS of these assignments is double-quoted so the shell variables 
($as_cr_letters, etc) are expanded first, leaving us with something like

# Sed expression to map a string onto a valid CPP name.
as_tr_cpp="eval sed 'y%*abcdef...%PABCDEF...%;s%[^_abcABC123...]%_%g'"

# Sed expression to map a string onto a valid variable name.
as_tr_sh="eval sed 'y%*+%pp%;s%[^_abcABC123...]%_%g'"

Typical usage for these is e.g.

cat >>confdefs.h <<_ACEOF
#define `$as_echo "HAVE_$ac_header" | $as_tr_cpp` 1

We have an unquoted expansion of $as_tr_cpp, which is split by fields, then 
subject to glob expansion, and _then_ eval'ed (which strips the single quotes). 
 At the point of the glob expansion, the single quotes are not special.  We're 
only getting away with this because of how unlikely it is that a file matching 
the glob 'y%*abcdef...%PABCDEF...%;s%[^_abcABC123...]%_%g' (_including_ the 
single quotes) will exist, and how rare it is for anyone to turn on failglob or 
nullglob (probably M4sh breaks in many more ways if you do that).

Note that if there were spaces inside the single quotes, the sed script would 
get split into multiple words!

I think the only practical way to fix this is to convert $as_tr_* into shell 
functions, which will also mean that we don't need the eval anymore, which is 
nice.  Something like

as_fn_tr_cpp () {
  sed "y%*$as_cr_letters%P$as_cr_LETTERS%;s%[^_$as_cr_alnum]%_%g"

If we don't like having $as_cr_letters etc expanded when as_fn_tr_cpp is 
invoked, we can define the functions using the equivalent m4_cr_* macros 
instead and use single quotes around the sed script.

I don't know when I will have time to write a patch for this.


p.s. I'm sympathetic to mirabilos' position that POSIX should not mandate 
[^...] range complement in globs; but given that there exist several 
widely-used shells that already implement [^...], _none_ of the possible 
changes to POSIX actually make anything better.  There exist scripts that 
require [^...] to expand, and scripts that require it _not_ to expand.  
Defensively coded scripts have to avoid [^...] entirely, which as we see above 
can be a major headache.  Changes to POSIX take upwards of ten years to become 

Honestly, at this point in history I would be inclined to say "No further 
changes to the POSIX shell language, period. It is what it is. Use a less 
terrible scripting language if you have the option."

reply via email to

[Prev in Thread] Current Thread [Next in Thread]