bug-autoconf
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] regex: Add extra escapes to regular expressions in m4


From: Eric Blake
Subject: Re: [PATCH] regex: Add extra escapes to regular expressions in m4
Date: Tue, 29 Jan 2019 14:46:42 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0

[adding bug-autoconf]

On 1/29/19 2:18 PM, Eric Blake wrote:
> On 1/29/19 12:57 PM, Siddhesh Poyarekar wrote:
>> From: Siddhesh Poyarekar <address@hidden>
>>
>>      * m4/regex.m4 (gl_REGEX): Add extra escape characters to
>>      regular expressions.
>> ---
>>
>> The m4 preprocessor eats up half the escape characters, so give it twice
>> as much.  I ran into this when running tests for glibc 2.29 release and
>> verified that this patch fixes the problem.
> 
> Which versions of m4 and autoconf are you seeing this under? Can you
> show actual snippets from the generated configure file showing that \
> was eaten?  And why are you only touching some of the lines, rather than
> all places where \\ appears in the regex.m4 file?  This fix feels fishy,
> and I seriously doubt that escape characters are being eaten by m4, but
> I would like to make sure we have a real root cause understanding what
> prompted this patch.

Aha - it is NOT m4, but the shell handling of \ in a heredoc in unquoted
context that is doing it.  Compare:

$ bash -c 'cat <<ABC
a\b\\c\\\d\\\\e"f\g\\h\\\i\\\\j"
ABC'
a\b\c\\d\\e"f\g\h\\i\\j"

$ bash -c 'cat <<\ABC
a\b\\c\\\d\\\\e"f\g\\h\\\i\\\\j"
ABC'
a\b\\c\\\d\\\\e"f\g\\h\\\i\\\\j"

$ dash -c 'cat <<ABC
a\b\\c\\\d\\\\e"f\g\\h\\\i\\\\j"
ABC'
a\b\c\\d\\e"f\g\h\\i\\j"

$ dash -c 'cat <<\ABC
a\b\\c\\\d\\\\e"f\g\\h\\\i\\\\j"
ABC'
a\b\\c\\\d\\\\e"f\g\\h\\\i\\\\j"

> 
>> +++ b/m4/regex.m4
>> @@ -204,7 +204,7 @@ AC_DEFUN([gl_REGEX],
>>                             & ~RE_CONTEXT_INVALID_DUP
>>                             & ~RE_NO_EMPTY_RANGES);
>>              memset (&regex, 0, sizeof regex);
>> -            s = re_compile_pattern ("[[:alnum:]_-]\\\\+$", 16, &regex);
>> +            s = re_compile_pattern ("[[:alnum:]_-]\\\\\\\\+$", 16, &regex);

m4/regex.m4 is using an AC_LANG_PROGRAM() macro, which has the
unfortunate longstanding behavior at least in autoconf 2.69, but it
looks like it goes back much further to older releases, of eventually
expanding as:

m4_define([AC_LANG_CONFTEST(C)],
[cat confdefs.h - <<_ACEOF >conftest.$ac_ext
/* end confdefs.h.  */
$1
_ACEOF])

which produces an unquoted heredoc and therefore eats duplicated \
inside any program snippets.  We can't switch to quoted heredocs
(because users may have come to expect expansion of $shellvar when
writing their programs), and the problem is not apparent with the most
common usage of a single \.  We could teach autoconf 2.70 to double up
\\ automatically in a lang snippet (which scales nicer than every .m4
file having to double up, and makes it so you can copy snippets back and
forth between .m4 and .c files without having to remember to
add/subtract \) - but there's still the issue of catering to distros
still using older autoconf (gnulib can force a new behavior borrowed
from a patched autoconf, but not everyone uses gnulib).

Looking at m4/fnmatch.m4, it looks like we are already used to the idea
of doubling up \\ that will pass through AC_LANG_PROGRAM() and the
unquoted heredoc; on that grounds, your patch is correct.

If nothing else, the Autoconf manual should be documented to mention
this behavior of \\ in source code snippets in m4 files (if it does not
go one step further to auto-patch them).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]