autoconf-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: config files substitution with awk


From: Ralf Wildenhues
Subject: Re: config files substitution with awk
Date: Mon, 20 Nov 2006 21:22:49 +0100
User-agent: Mutt/1.5.13 (2006-08-11)

Hello Paul,

Thanks for the review.

* Paul Eggert wrote on Mon, Nov 20, 2006 at 07:03:18PM CET:
> Ralf Wildenhues <address@hidden> writes:
> 
> > I did not see an easy way to write it portably to ancient awk[...]
> 
> What difficulties do you see with ancient awk?

Not being very experienced with it, mostly.

> For example, this non-ancient loop
[...]
> can easily be written in ancient awk using something like this:

Ah, good-

> This is arguably even more readable when written in the ancient style
> (though I admit I don't know what that 'skip' is doing there in the
> original :-).

Yeah, blame it on lack of concentration.  (The original idea was to have
a loop to allow both recursive and nonrecursive substitution, without
needing a marker.  But we don't need to pursue that.)

> > One drawback for AC_SUBST_FILE currently present causes a noticeable
> > regression due to the fact that awk's system function is used for each
> > such substitution.
> 
> Why can't we use a repeated getline/print loop here?

Because I was scared of:

> > The autoconf.texi note leaves me uncertain what we
> > can portably expect from awk's getline. 

I've used such a loop now.

> (Maybe we should deprecate AC_SUBST_FILE?....)

I don't see a good reason for that.  I think it can be useful, given the
size restrictions on the value of AC_SUBST that are still in place.

> > Is it necessary to 'chmod +x' a file before sourcing it ('. ./file')?
> 
> No.

OK.  Updated patch below, not requiring AC_PROG_AWK.

If I read 'gawk.info(Gory Details)', correctly, then we do need a test
for the runs-of-backslashes-before-ampersand escaping rules, in order
add the right amount of backslashes.  :-/
I've amended the testsuite, but I still need to think more about the
code.

Cheers,
Ralf

2006-11-20  Ralf Wildenhues  <address@hidden>

        Rewrite config files generation: replace quadratic growth in
        the number of substituted variables with loglinear growth by
        using awk instead of sed for the bulk of the substitutions.
        * lib/autoconf/status.m4 (_AC_AWK_LITERAL_LIMIT): New macro.
        (_AC_OUTPUT_FILES_PREPARE): Instead of several sed scripts,
        generate just one large awk script for substitutions,
        eliminating much of the earlier complexity, while adding some
        new complexity.  Only expand the substitution templates at
        configure time, for smaller configure script size.
        (_AC_SUBST_CMDS): Renamed from...
        (_AC_SED_CMDS): ...this.
        (_AC_DELIM_NUM): Renamed from...
        (_AC_SED_DELIM_NUM): ...this.
        (_AC_SED_CMD_NUM, _AC_SED_FRAG, _AC_SED_FRAG_NUM): Removed.
        (_AC_OUTPUT_FILE): Use _AC_SUBST_CMDS.
        * tests/torture.at (Substitute a 2000-byte string): Also
        substitute a line with 1000 words, and a variable with several
        long lines.
        (Substitute and define special characters): Also substitute
        ampersands, and put substitution input strings address@hidden@' in the
        output, to test that no recursion happens.
        * NEWS: Update.

--- NEWS        17 Nov 2006 20:01:04 -0000      1.413
+++ NEWS        20 Nov 2006 19:45:01 -0000
@@ -1,5 +1,8 @@
 * Major changes in Autoconf 2.61a (??)
 
+** config.status now uses awk for substitutions, for improved scaling
+  with the number of substituted variables.
+
 * Major changes in Autoconf 2.61 (2006-11-17)
 
 ** New macros AC_C_FLEXIBLE_ARRAY_MEMBER, AC_C_VARARRAYS.
--- lib/autoconf/status.m4      2006-11-18 04:04:15.000000000 +0100
+++ lib/autoconf/status.m4      2006-11-20 20:44:17.000000000 +0100
@@ -311,6 +311,16 @@
 [99])
 
 
+# _AC_AWK_LITERAL_LIMIT
+# ---------------------
+# Evaluate the maximum number of characters to put in an awk
+# string literal, not counting escape characters.
+#
+# Some awk's have small limits, such as Solaris and AIX awk.
+m4_define([_AC_AWK_LITERAL_LIMIT],
+[148])
+
+
 # _AC_OUTPUT_FILES_PREPARE
 # ------------------------
 # Create the sed scripts needed for CONFIG_FILES.
@@ -319,7 +329,7 @@
 # The intention is to have readable config.status and configure, even
 # though this m4 code might be scaring.
 #
-# This code was written by Dan Manthey.
+# This code was written by Dan Manthey and rewritten by Ralf Wildenhues.
 #
 # This macro is expanded inside a here document.  If the here document is
 # closed, it has to be reopened with "cat >>$CONFIG_STATUS <<\_ACEOF".
@@ -328,81 +338,42 @@
 [#
 # Set up the sed scripts for CONFIG_FILES section.
 #
-dnl ... and define _AC_SED_CMDS, the pipeline which executes them.
-m4_define([_AC_SED_CMDS], [])dnl
+dnl ... and define _AC_SUBST_CMDS, the pipeline which executes them.
+m4_define([_AC_SUBST_CMDS], [| awk -f "$tmp/subs.awk" ])dnl
 
 # No need to generate the scripts if there are no CONFIG_FILES.
 # This happens for instance when ./config.status config.h
 if test -n "$CONFIG_FILES"; then
 
+echo 'BEGIN {' >"$tmp/subs.awk"
 _ACEOF
 
-m4_pushdef([_AC_SED_FRAG_NUM], 0)dnl Fragment number.
-m4_pushdef([_AC_SED_CMD_NUM], 2)dnl Num of commands in current frag so far.
-m4_pushdef([_AC_SED_DELIM_NUM], 0)dnl Expected number of delimiters in file.
-m4_pushdef([_AC_SED_FRAG], [])dnl The constant part of the current fragment.
-dnl
 m4_ifdef([_AC_SUBST_FILES],
-[# Create sed commands to just substitute file output variables.
-
-m4_foreach_w([_AC_Var], m4_defn([_AC_SUBST_FILES]),
-[dnl End fragments at beginning of loop so that last fragment is not ended.
-m4_if(m4_eval(_AC_SED_CMD_NUM + 3 > _AC_SED_CMD_LIMIT), 1,
-[dnl Fragment is full and not the last one, so no need for the final un-escape.
-dnl Increment fragment number.
-m4_define([_AC_SED_FRAG_NUM], m4_incr(_AC_SED_FRAG_NUM))dnl
-dnl Record that this fragment will need to be used.
-m4_define([_AC_SED_CMDS],
-  m4_defn([_AC_SED_CMDS])[| sed -f "$tmp/subs-]_AC_SED_FRAG_NUM[.sed" ])dnl
-[cat >>$CONFIG_STATUS <<_ACEOF
-cat >"\$tmp/subs-]_AC_SED_FRAG_NUM[.sed" <<\CEOF
-/@[a-zA-Z_][a-zA-Z_0-9]*@/!b
-]m4_defn([_AC_SED_FRAG])dnl
-[CEOF
-
-_ACEOF
-]m4_define([_AC_SED_CMD_NUM], 2)m4_define([_AC_SED_FRAG])dnl
-])dnl Last fragment ended.
-m4_define([_AC_SED_CMD_NUM], m4_eval(_AC_SED_CMD_NUM + 3))dnl
-m4_define([_AC_SED_FRAG],
-m4_defn([_AC_SED_FRAG])dnl
-[/^[    address@hidden@[        ]*$/{
-r $]_AC_Var[
-d
-}
-])dnl
+[# Create commands to substitute file output variables.
+
+{
+  echo "cat >>$CONFIG_STATUS <<_ACEOF"
+  echo 'cat >>"\$tmp/subs.awk" <<\CEOF'
+  echo "$ac_subst_files" | sed 's/.*/F@<:@"&"@:>@ = "$&"/'
+  echo "CEOF"
+  echo "_ACEOF"
+} >conf$$files.sh
+. ./conf$$files.sh
+rm -f conf$$files.sh
 ])dnl
-# Remaining file output variables are in a fragment that also has non-file
-# output varibles.
 
-])
-dnl
-m4_define([_AC_SED_FRAG], [
-]m4_defn([_AC_SED_FRAG]))dnl
-m4_foreach_w([_AC_Var],
-m4_ifdef([_AC_SUBST_VARS], [m4_defn([_AC_SUBST_VARS]) ])address@hidden@],
-[m4_if(_AC_SED_DELIM_NUM, 0,
-[m4_if(_AC_Var, address@hidden@],
-[dnl The whole of the last fragment would be the final deletion of `|#_!!_#|'.
-m4_define([_AC_SED_CMDS], m4_defn([_AC_SED_CMDS])[| sed 's/|#_!!_#|//g' ])],
-[
-ac_delim='%!_!# '
-for ac_last_try in false false false false false :; do
-  cat >conf$$subs.sed <<_ACEOF
-])])dnl
-m4_if(_AC_Var, address@hidden@],
-      [m4_if(m4_eval(_AC_SED_CMD_NUM + 2 <= _AC_SED_CMD_LIMIT), 1,
-             [m4_define([_AC_SED_FRAG], [ end]m4_defn([_AC_SED_FRAG]))])],
-[m4_define([_AC_SED_CMD_NUM], m4_incr(_AC_SED_CMD_NUM))dnl
-m4_define([_AC_SED_DELIM_NUM], m4_incr(_AC_SED_DELIM_NUM))dnl
-_AC_Var!$_AC_Var$ac_delim
-])dnl
-m4_if(_AC_SED_CMD_LIMIT,
-      m4_if(_AC_Var, address@hidden@], m4_if(_AC_SED_CMD_NUM, 2, 2, 
_AC_SED_CMD_LIMIT), _AC_SED_CMD_NUM),
-[_ACEOF
-
-dnl Do not use grep on conf$$subs.sed, since AIX grep has a line length limit.
-  if test `sed -n "s/.*$ac_delim\$/X/p" conf$$subs.sed | grep -c X` = 
_AC_SED_DELIM_NUM; then
+{
+  echo "cat >conf$$subs.awk <<_ACEOF"
+  echo "$ac_subst_vars" | sed 's/.*/&!$&$ac_delim/'
+  echo "_ACEOF"
+} >conf$$subs.sh
+ac_delim_num=`echo "$ac_subst_vars" | grep -c '$'`
+ac_delim='%!_!# '
+for ac_last_try in false false false false false :; do
+  . ./conf$$subs.sh
+
+dnl Do not use grep on conf$$subs.awk, since AIX grep has a line length limit.
+  if test `sed -n "s/.*$ac_delim\$/X/p" conf$$subs.awk | grep -c X` = 
$ac_delim_num; then
     break
   elif $ac_last_try; then
     AC_MSG_ERROR([could not make $CONFIG_STATUS])
@@ -410,51 +381,89 @@
     ac_delim="$ac_delim!$ac_delim _$ac_delim!! "
   fi
 done
+rm -f conf$$subs.sh
 
 dnl Similarly, avoid grep here too.
-ac_eof=`sed -n '/^CEOF[[0-9]]*$/s/CEOF/0/p' conf$$subs.sed`
+ac_eof=`sed -n '/^CEOF[[0-9]]*$/s/CEOF/0/p' conf$$subs.awk`
 if test -n "$ac_eof"; then
   ac_eof=`echo "$ac_eof" | sort -nru | sed 1q`
   ac_eof=`expr $ac_eof + 1`
 fi
-
-dnl Increment fragment number.
-m4_define([_AC_SED_FRAG_NUM], m4_incr(_AC_SED_FRAG_NUM))dnl
-dnl Record that this fragment will need to be used.
-m4_define([_AC_SED_CMDS],
-m4_defn([_AC_SED_CMDS])[| sed -f "$tmp/subs-]_AC_SED_FRAG_NUM[.sed" ])dnl
-[cat >>$CONFIG_STATUS <<_ACEOF
-cat >"\$tmp/subs-]_AC_SED_FRAG_NUM[.sed" <<\CEOF$ac_eof
-/@[a-zA-Z_][a-zA-Z_0-9]*@/!b]m4_defn([_AC_SED_FRAG])dnl
-[_ACEOF
-sed '
-s/[,\\&]/\\&/g; s/@/@|#_!!_#|/g
-s/^/s,@/; s/!/@,|#_!!_#|/
-:n
-t n
-s/'"$ac_delim"'$/,g/; t
-s/$/\\/; p
-N; s/^.*\n//; s/[,\\&]/\\&/g; s/@/@|#_!!_#|/g; b n
-' >>$CONFIG_STATUS <conf$$subs.sed
-rm -f conf$$subs.sed
-cat >>$CONFIG_STATUS <<_ACEOF
-]m4_if(_AC_Var, address@hidden@],
-[m4_if(m4_eval(_AC_SED_CMD_NUM + 2 > _AC_SED_CMD_LIMIT), 1,
-[m4_define([_AC_SED_CMDS], m4_defn([_AC_SED_CMDS])[| sed 's/|#_!!_#|//g' ])],
-[[:end
-s/|#_!!_#|//g
-]])])dnl
-CEOF$ac_eof
-_ACEOF
-m4_define([_AC_SED_FRAG], [
-])m4_define([_AC_SED_DELIM_NUM], 0)m4_define([_AC_SED_CMD_NUM], 2)dnl
-
-])])dnl
-dnl
-m4_popdef([_AC_SED_FRAG_NUM])dnl
-m4_popdef([_AC_SED_CMD_NUM])dnl
-m4_popdef([_AC_SED_DELIM_NUM])dnl
-m4_popdef([_AC_SED_FRAG])dnl
+dnl Initialize an awk array of substitutions, keyed by variable name.
+dnl
+dnl First read a whole (potentially multi-line) substitution,
+dnl and construct `S["VAR"] ='.  Then, escape '@' in the value,
+dnl and split it into pieces that fit in an awk literal.
+dnl Each piece then gets active characters escaped:
+dnl    "       -> \"
+dnl    \       -> \\
+dnl    newline -> \n
+dnl    &       -> \\&  (otherwise & will be active in awk's sub)
+dnl
+dnl (if we escape earlier we risk splitting inside an escape sequence).
+dnl Output as separate string literals, joined with backslash-newline.
+dnl Eliminate the newline after `=' in a second script, for readability.
+dnl
+dnl m4-double-quote most of the scripting for readability.
+[cat >>$CONFIG_STATUS <<_ACEOF
+cat >>"\$tmp/subs.awk" <<\CEOF$ac_eof
+_ACEOF
+sed '
+t line
+:line
+s/'"$ac_delim"'$//; t gotline
+N; b line
+:gotline
+h
+s/^/S["/; s/!.*/"] = /; p
+g
+s/^.*!//; s/@/@|#_!!_#|/g
+:more
+t more
+h
+s/\(.\{]_AC_AWK_LITERAL_LIMIT[\}\).*/\1/
+t notlast
+s/["\\]/\\&/g; s/\n/\\n/g; s/&/\\\\&/g
+s/^/"/; s/$/"/
+b
+:notlast
+s/["\\]/\\&/g; s/\n/\\n/g; s/&/\\\\&/g
+s/^/"/; s/$/"\\/
+p
+g
+s/.\{]_AC_AWK_LITERAL_LIMIT[\}//
+b more
+' <conf$$subs.awk | sed '
+/^[^"]/{
+  N
+  s/\n//
+}
+' >>$CONFIG_STATUS
+rm -f conf$$subs.awk
+cat >>$CONFIG_STATUS <<_ACEOF
+  FS = "[|]#_!!_#[|]"
+}
+/@[a-zA-Z_][a-zA-Z_0-9]*@/ {
+  nfields = split($ 0, field, "@")
+  for (i = 1; i <= nfields; i++) {
+    key = field[i]
+    if (key in S)
+      sub("@" key "@", S[key])
+    else if (key in F) {
+      while ((getline aline < F[key]) > 0)
+         print(aline)
+      close(F[key])
+      next
+    }
+  }
+}
+{
+  gsub("[|]#_!!_#[|]", "")
+  print
+}
+CEOF$ac_eof
+_ACEOF
+]dnl end of double-quoted part
 
 # VPATH may cause trouble with some makes, so we remove $(srcdir),
 # ${srcdir} and @srcdir@ from VPATH if srcdir is ".", strip leading and
@@ -554,7 +563,7 @@
 ])dnl
 m4_ifndef([AC_DATAROOTDIR_CHECKED], [$ac_datarootdir_hack
 ])dnl
-" $ac_file_inputs m4_defn([_AC_SED_CMDS])>$tmp/out
+" $ac_file_inputs m4_defn([_AC_SUBST_CMDS])>$tmp/out
 
 m4_ifndef([AC_DATAROOTDIR_CHECKED],
 [test -z "$ac_datarootdir_hack$ac_datarootdir_seen" &&
--- tests/torture.at    28 Oct 2006 09:41:07 -0000      1.72
+++ tests/torture.at    20 Nov 2006 20:09:28 -0000
@@ -539,18 +539,26 @@
 # Solaris 9 /usr/ucb/sed that rejects commands longer than 4000 bytes.  HP/UX
 # sed dumps core around 8 KiB.  However, POSIX says that sed need not
 # handle lines longer than 2048 bytes (including the trailing newline).
-# So we'll just test a 2000-byte value.
+# So we'll just test a 2000-byte value, and for awk, we test a line with
+# almost 1000 words, and one variable with 4 lines of 500 bytes each.
 
 AT_SETUP([Substitute a 2000-byte string])
 
 AT_DATA([Foo.in], address@hidden@
 ])
+AT_DATA([Bar.in], address@hidden@
+])
+AT_DATA([Baz.in], address@hidden@
+])
 
 AT_DATA([configure.ac],
 [[AC_INIT
 AC_CONFIG_AUX_DIR($top_srcdir/build-aux)
 AC_SUBST([foo], ]m4_for([n], 1, 100,, ....................)[)
-AC_CONFIG_FILES([Foo])
+AC_SUBST([bar], "]m4_for([n], 1, 100,, . . . . . . . . . ..)[")
+AC_SUBST([baz], "]m4_for([n], 1, 4,, m4_for([m], 1, 25,, ... ... ... ... ....)
+)[")
+AC_CONFIG_FILES([Foo Bar Baz])
 AC_OUTPUT
 ]])
 
@@ -558,6 +566,11 @@
 AT_CHECK_CONFIGURE
 AT_CHECK([cat Foo], 0, m4_for([n], 1, 100,, ....................)
 )
+AT_CHECK([cat Bar], 0, m4_for([n], 1, 100,, . . . . . . . . . ..)
+)
+AT_CHECK([cat Baz], 0, m4_for([n], 1, 4,, m4_for([m], 1, 25,, ... ... ... ... 
....)
+)
+)
 AT_CLEANUP
 
 
@@ -589,20 +602,26 @@
 AT_SETUP([Substitute and define special characters])
 
 AT_DATA([Foo.in], address@hidden@
address@hidden@@notsubsted@@baz@ stray @ and more@@@baz@
 ])
 
 AT_CONFIGURE_AC(
-[[foo="AS@&address@hidden([[X*'[]+ ", `\($foo]])"
+[[foo="AS@&address@hidden([[X*'[]+ ",& &`\($foo \& \\& \\\& \\\\&]])"
+bar="@foo@ @baz@"
+baz=bla
 AC_SUBST([foo])
-AC_DEFINE([foo], [[X*'[]+ ", `\($foo]], [Awful value.])
+AC_SUBST([bar])
+AC_SUBST([baz])
+AC_DEFINE([foo], [[X*'[]+ ",& &`\($foo]], [Awful value.])
 AC_CONFIG_FILES([Foo])]])
 
 AT_CHECK_AUTOCONF
 AT_CHECK_AUTOHEADER
 AT_CHECK_CONFIGURE
-AT_CHECK([cat Foo], 0, [[X*'[]+ ", `\($foo
+AT_CHECK([cat Foo], 0, [[X*'[]+ ",& &`\($foo \& \\& \\\& \\\\&
address@hidden@ @baz@@address@hidden stray @ and more@@bla
 ]])
-AT_CHECK_DEFINES([[#define foo X*'[]+ ", `\($foo
+AT_CHECK_DEFINES([[#define foo X*'[]+ ",& &`\($foo
 ]])
 AT_CLEANUP
 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]