bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Parallelization of shell scripts for 'configure' etc.


From: L A Walsh
Subject: Re: Parallelization of shell scripts for 'configure' etc.
Date: Sat, 18 Jun 2022 06:55:27 -0700
User-agent: Thunderbird

On 2022/06/13 15:39, Paul Eggert wrote:
In many Gnu projects, the 'configure' script is the biggest barrier to building because it takes soooo long to run. Is there some way that we could improve its performance without completely reengineering it, by improving Bash so that it can parallelize 'configure' scripts?
----
I don't know what type of instrumentations you've done over configure,
but before investing a much time in optimization, it might be interesting
to know where most of the time is being spent.

I.e cpu, I/O -- what types of I/O -- actual test-I/O or executable load time.

The reason I say that is that having run configure for the same projects
on linux and on cygwin -- I note that the cygwin version is MUCH slower
doing the same work on the same machine.
A big slowdown in cygwin is loading & starting a new executable/binary.
I.e loading 100 programs 10x each will take a disproportionately higher time
on cygwin due to its exec-load penalty (Since windows has no fork, all of the
memory space duplication (and later copies on write) has to be done manually
in cygwin -- very painful.  But noting that one of the big boosts in
shell scripts can come from using a parallel option of a util vs. feeding in
file/pathnames one at a time like using 'find -exec "rm {}" \; or similar.

Similarly a big speed up in configure might be to use the bundled version of
coreutils (all binaries in 1 image invoked via different command names), and
put that in the same binary as bash, perhaps via a loadable command, with
any following core-util calls being routed "in-binary" to the already loaded
version.  Of course it would likely not be trivial assuring all the
commands can be re-invoked to assure they had their necessary initializations
redone on each "in-image" launch, but keeping all the coreutil binaries
"in-memory", I think would be a big win even if it wasn't multi-threaded. Of course it might be of benefit if the various utils were all thread safe, so
a more powerful dispatcher could use multi-threading w/o worries about
thread safety, but just eliminating most of the redundant util-loads might be a huge win by itself. That's sorta why I was wondering how much perf-profiling
had been done on config(.sh)...

Anyway -- just some random thoughts...

For ideas about this, please see PaSh-JIT:

Kallas K, Mustafa T, Bielak J, Karnikis D, Dang THY, Greenberg M, Vasilakis N. Practically correct, just-in-time shell script parallelization. Proc OSDI 22. July 2022. https://nikos.vasilak.is/p/pash:osdi:2022.pdf

I've wanted something like this for *years* (I assigned a simpler version to my undergraduates but of course it was too much to expect them to implement it) and I hope some sort of parallelization like this can get into production with Bash at some point (or some other shell if Bash can't use this idea).




reply via email to

[Prev in Thread] Current Thread [Next in Thread]