bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Parallelization of shell scripts for 'configure' etc.


From: Tim Rühsen
Subject: Re: Parallelization of shell scripts for 'configure' etc.
Date: Sat, 18 Jun 2022 21:05:15 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.10.0

Hi all,

On 14.06.22 00:39, Paul Eggert wrote:
In many Gnu projects, the 'configure' script is the biggest barrier to building because it takes soooo long to run. Is there some way that we could improve its performance without completely reengineering it, by improving Bash so that it can parallelize 'configure' scripts?

A faster configure script execution indeed is something I'd love to see.
The title of this thread infers that we *only* want to discuss parallelization - maybe we can generalize this to "Making configure scripts run faster" ?

[A little side-note: the invocation of gnulib-tool is *far* slower than the running the configure scripts, for the projects that I work on.
But surely this is a problem on it own.]

I see two main setups when running configure scripts. How to speeding up the execution has several possible solutions for each of the setups (but with overlaps of course).

a) The maintainer/contributor/hacker setup
This is when you re-run configure relatively often for the same project(s).
I do this normally and and came up with https://gitlab.com/gnuwget/wget2/-/wikis/Developer-hints:-Increasing-speed-of-GNU-toolchain. It may be a bit outdated, but may help one or the other here.
Btw, I am down to 2.5s for a ./configure run from 25s originally.

b) The one-time build setup
This is people building + installing from tarball and automated build systems (e.g. CI) with regular OS updates. I also think of systems like Gentoo where you build everything from source. As Alex Ameen pointed out, using a global configure cache across different projects may be insecure.
Also people often want to use optimization in this case.
Installing ccache is also not likely when people just want to build+install a single project.

I personally see a) as solved, at least for me.

b) is a problem because
1. People start to complain about the slow GNU build system (autotools), which drives new projects away from using autotools and possible it drives people away from GNU in general. Or in other words: let's not eat up people's precious time unnecessarily when building our software.

2. Building software in large scale eats tons of energy. If we could reduce the energy consumption, it gives us at least a better feeling.


What can we do to solve b)
I guess we first need to analyze/profile the configure execution.
For this I wrote a little tool some years ago: https://gitlab.com/rockdaboot/librusage. It's simple to build and use and gives some number of which (external) commands are executed - fork+exec are pretty heavy. [Configure for wget2 runs 'rm' and 'cat' each roughly 2000x - so I came up with enabling plugins for those two commands (had to write a plugin for 'rm', not sure if it never has been accepted by bash upstream).] Maybe be we can create plugins for other highly used commands as well e.g. for sed !?

The output of the tool also roughly allows to see where the time goes - it's beyond my spare time to go deeper into this right now.
Please test yourself and share some numbers.

Another option is to group tests, e.g. if test 1 is X, we also know the results for tests 2,3,4,... Or we group several tests into a single C file, if possible. Just an idea (sounds a bit tedious, though).

Parallelism... can't we do that with &, at least for well-known / often-used tests ?

Family calls...

Regards, Tim

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]