[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

WISHLIST: work queues for SMP machines and background jobs

From: Austin Donnelly
Subject: WISHLIST: work queues for SMP machines and background jobs
Date: Tue, 10 Apr 2001 16:25:58 +0100

Configuration Information [Automatically generated, do not change]:
Machine: i386
OS: Linux
Compiler: gcc
-O2 -m486 -fno-strength-reduce
uname output: Linux hornet.cl.cam.ac.uk 2.2.16-4.cl.ext3 #1 Fri Aug 18 14:18:55 
BST 2000 i686 unknown

Bash Version: 1.14
Patch Level: 7


This describes a WISHLIST/FEATURE REQUEST I'd like considered for
future versions of bash.

Nowadays, multiprocessor machines are increasingly common.  However,
it is often hard to use the processors effectively.

In particular, some tasks are inherently parallel, such as the
encoding of digital audio files.  Given a number of .wav files, it is
trivial to write:

  for i in *.wav; do
    encode $i &

However on most machines this will perform poorly as there will
typically be fewer CPUs than wav files.  While this loop exposes the
maximum amount of parallelism available, this is too much for most

My proposal is for bash to find out how many CPUs are present on the
particular system it is running on, and keep a "work queue" of
available jobs.  Bash can then limit the number of running jobs to at
or below the system CPU limit.  To make this facility useful, a
"waitq" command is needed, which waits for the work queue to be empty
and for all related issued commands to have terminated.

So the example given earlier could be written:

  for i in *.wav; do
    encode $i <&>

Where "<&>" is some new syntax introduced which means to enqueue the
preceding command on the work queue.  The difference between this
version and the previous is that at most NUM_CPU copies of the encode
process will be started.

I haven't thought much about how pipelines should be handled, but I
suspect that each process in the pipeline should count as a
outstanding process.  So:
  grep foo file1 | wc >results1 <&>
  grep foo file2 | wc >results2 <&>
on a 2-CPU machine would run the pipelines one after the other, but on
a 4-CPU box it would run them in parallel.

On a uniprocessor, the <&> syntax and waitq command have no effect,
since there is never an opportunity to issue multiple commands in
parallel.  The <&> notation merely annotates the script to let bash
know where the potential for parallelism exists.

It is possible that more than one work queue might be useful.  This
could be easily supported by allowing <&NAME> to refer to the NAMEd work
queue etc, and "waitq NAME" to wait for that queue to drain.  The
empty string would represent the "default" work queue.

While the motivating example given is fairly simple, the ability to
have such minimal SMP-awareness in bash is probably a useful general

At the very least, please consider adding an environment variable
(_NUM_CPUS or somesuch) giving portable access to the number of CPUs
available on the current system.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]