[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 0/4] faster gnulib-tool
From: |
Paolo Bonzini |
Subject: |
Re: [PATCH 0/4] faster gnulib-tool |
Date: |
Fri, 02 Jan 2009 10:17:00 +0100 |
User-agent: |
Thunderbird 2.0.0.18 (Macintosh/20081105) |
Bruno Haible wrote:
> Hello Ralf,
>
> Thank you for your speedups to gnulib-tool. At first I was, of course,
> excited about the 2x speedup. But when looking at the maintainability
> of the code that you propose, I'm not fine with all of it any more.
>
> My four objections are:
>
> 1) You observe that forking programs in a shell script is slow, and
> therefore propose to use more shell built-ins. The problem with it
> is that I chose to implement gnulib-tool in shell (for the control
> structure) and sed (for the text processing).
>
> If you want to achieve good speedups for scripts that use 'sed':
> can you work towards making 'sed' a bash built-in? This is challenging,
> but if you are after performance, that would be promising.
The only microoptimization that is worthwhile doing to speed up shell
scripts, is to avoid forking. This is *no exaggeration*. (I am not
talking about algorithmic improvements; though in some cases, as Ralf
showed, forking can hide the benefits of algorithmic improvements).
We (Ralf, Eric and I) saved over 30% of execution time of Autoconf
scripts on Cygwin, and a few percent on Linux too, just by removing one
or two forks here and there. It's not about making sed a bash builtin,
it's about not forking for things such as
(...)
echo abc | ...
A while ago I made a lot of timings regarding the speed of various shell
constructs; you can find them on the Autoconf list. Here are the
relevant ones:
$ time sh -c 'for i in `seq 1 1000`; do :; done'
user 0m0.034s
sys 0m0.024s
$ time sh -c 'for i in `seq 1 1000`; do (:); done'
user 0m0.486s
sys 0m2.377s
$ time sh -c 'for i in `seq 1 1000`; do echo abc | :; done'
user 0m0.958s
sys 0m4.657s
echo and : are shell builtins, but they fork, so they're slow. s/:/sed/
and you see my point.
If this 10x-30x improvement affected 20% of the shell execution time,
one could expect a decent speedup.
That would probably amount to a rewrite of bash, dash, or whatever else.
You would have to make the main shell loop centered around
event-driven processing of file descriptors, to provide all the pipes
with a single process.
A fun project, but probably not one that I or anyone else will attempt
without funding. :-)
Paolo