[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: config files substitution with awk

From: Paul Eggert
Subject: Re: config files substitution with awk
Date: Mon, 04 Dec 2006 22:09:37 -0800
User-agent: Gnus/5.1008 (Gnus v5.10.8) Emacs/21.4 (gnu/linux)

address@hidden (Karl Berry) writes:

> But for purposes of explaining to rms, can someone tell me (as briefly
> as possible :) why it is desirable to switch to awk instead of sticking
> with sed?


For example, Ralf tested the change with OpenMPI, and switching from
'sed' to 'awk' sped up 'configure' by over a factor of 4.  With 'sed',
'configure' took over a minute (user+system CPU time).  With 'awk', it
took less than 15 seconds.

The speed comes from awk's hash tables.  Here's how to substitute
values for V variables using 'sed':


This is O(N**2).  With awk, you can do something roughly like this:

   nfields = split(line, field, "@")
   for (i = 2; i < nfields; i++) {
     key = field[i]
     if (key in S)
       field[i] = S[key]

where S[k] gives you the value for key k.  This is O(N log N).

The actual code and analysis are more complicated -- among other
things, we're assuming input lines have bounded length, which is the
common practical case -- but here's Ralf Wildenhues's precis:

   If you have F config files, each with L lines in which substitutions
   apply, and S substituted variables, then the overall work for creating
   all config files currently scales roughly as

      F * (c1 * (L + S) + c2 * (L * S)) + c3 * S         (assuming 'sed')

   c1 is larger than c2, but the c2 term causes the most work for large
   packages.  If you switch from 'sed' to 'awk', this changes to:

      F * (c1 * (L + S) + c2 * (L * log (S))) + c3 * S   (assuming 'awk')

A longer version of this analysis can be found in

reply via email to

[Prev in Thread] Current Thread [Next in Thread]