bug-make
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Add support for limiting CPU pressure


From: Randy MacLeod
Subject: Re: Add support for limiting CPU pressure
Date: Tue, 20 Dec 2022 16:59:13 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2

Thanks for the comment Howard.

On 2022-12-20 16:02, Howard Chu wrote:
contrib@zhengqiu.net wrote:
Hello,

While we are doing development with Yocto Project (1), a source-
based Linux distro builder, we found that many instances of making
running in parallel with other work can overload even a large
many-core build machine. Existing mechanisms that are either not
system-wide (-j) or are too slow(-l has 1 minute averaging), so in
order to make optimal use of a multi-core system for some larger
tasks we need a new mechanism.

All the tests are done using an Ubuntu System, as we are not very lucky to
find similar features on macOS/Windows, so if you know anything, please
help us out! Additionally, we also want to gather more data, so if you know
any other large packages that make is often tested with, please let us know
as well!

Relying on a new non-portable kernel feature sounds like a pretty bad idea.
Especially when you can easily solve the "not system-wide" aspect of "make -j"
portably: by using a named pipe. It should be a simple job to patch make to
create or reference a named pipe in an initial invocation, and the rest of
the existing jobserver machinery will Just Work after that, and all of
the relevant OSs have support for named pipes.



We did consider this approach and it can work in some but certainly
not all use cases and not the ones where this approach has the
most impact.

It could work where we "only" have one bitbake build
  (bitbake is the build tool used in the yocto world)
running and we want to share jobs across this single bitbake build.
We'd still have a problem with recipes (packages) that do not use make
to build and that don't have a job server design. While we could add
that feature to the other build tools such as ninja, it seems that PSI
is a more generic solution.

What this approach doesn't deal with occurs when there are several
bitbake builds that are all independently running or a situation where
there are say a couple of bitbake builds, a CPU intensive runtime
test using qemu and a clean-up job all running at the same time.
That may seem like an unusual use-case but the generic version is
that you want to do a build but you want to back-off if other processes
in any account, are also loading the machine that you are using. If
several users or builds are all using the same PSI back-off mechanism
then the system will be less likely to get into an overload or even a
process thrashing situation.

The shared job server approach is a good idea but has more limited
applicability, right?


Also, I'm re-posting the original thread below for the archives since
Zheng's email wasn't plain text and didn't get archived properly.
Oh and I have one comment below for Zheng.

I'll trim  the thread on any follow-up.

Thanks,

../Randy



Hello,

While we are doing development with Yocto Project (1), a source-
based Linux distro builder, we found that many instances of making
running in parallel with other work can overload even a large many-core build machine. Existing mechanisms that are either not system-wide (-j) or are too slow(-l has 1 minute averaging), so in order to make optimal use of a multi-core system for some larger
tasks we need a new mechanism. We found that on Linux, for the
4.20 kernel and later, a feature called Pressure Stall Information (PSI(2)) can provide system-wide metrics indicating when and by how much a system
is experiencing cpu, memory or io pressure.

So we implemented a new feature that uses /proc/pressure/cpu info to
limit new task creation. We previously implemented it for bitbake: , and find limiting tasks by using proc/pressure/cpu can significantly reduce system latency and CPU contention after Yocto uses this feature from bitbake, their CPU contention-related errors have been reduced to about once every two months compared to several times every week.

Here is the commit we have, although it needs to be cleaned up before commit,
it works fine when we tested it with OpenSSL on a 4-core system and found the
CPU pressure can be reduced by 20% while keeping the run time about the same.
This is not ideal though, so we also want to see if you have any suggested 
improvements
to this algorithm.

$ hyperfine --runs 5 'make clean && /usr/bin/time -o build-time-0.log 
./../mymake -j'
Benchmark 1: make clean && /usr/bin/time -o build-time-0.log ./../mymake -j
  Time (mean ± σ):     179.994 s ±  0.418 s    [User: 576.071 s, System: 56.747 
s]
  Range (min … max):   179.383 s … 180.441 s    5 runs
$ hyperfine --runs 5 'make clean && /usr/bin/time -o build-time-0.log ./../mymake -j -z 10'
Benchmark 1: make clean && /usr/bin/time -o build-time-0.log ./../mymake -j -z 
10
  Time (mean ± σ):     166.372 s ±  4.976 s    [User: 538.634 s, System: 59.617 
s]
  Range (min … max):   159.443 s … 171.906 s    5 runs
$ hyperfine --runs 5 'make clean && /usr/bin/time -o build-time-0.log ./../mymake -j -z 50'
Benchmark 1: make clean && /usr/bin/time -o build-time-0.log ./../mymake -j -z 
50
  Time (mean ± σ):     159.213 s ±  1.916 s    [User: 563.077 s, System: 60.442 
s]
  Range (min … max):   157.653 s … 162.464 s    5 runs

$ hyperfine --runs 5 'make clean && /usr/bin/time -o build-time-0.log 
./../mymake -j -z 90'
Benchmark 1: make clean && /usr/bin/time -o build-time-0.log ./../mymake -j -z 
90
  Time (mean ± σ):     159.546 s ±  0.499 s    [User: 568.825 s, System: 57.950 
s]
  Range (min … max):   158.947 s … 160.302 s    5 runs

$ hyperfine --runs 5 'make clean && /usr/bin/time -o build-time-0.log 
./../mymake -j 4'
Benchmark 1: make clean && /usr/bin/time -o build-time-0.log ./../mymake -j 4
  Time (mean ± σ):     156.164 s ±  0.324 s    [User: 545.251 s, System: 62.674 
s]
  Range (min … max):   155.776 s … 156.596 s    5 runs
* -z is the new option we added, range between 0-100, real name pending...

Zheng,

You need to run the test of two or three openssl builds running at the same
time to emulate what often happens in a bitbake build. My expectation is
that 3 "-j 4" builds will generate lots of pressure and with "-j -z 10",
the 3 builds will share the system, avoid CPU contention and may actually
finish a bit more quickly than the "-j 4" approach.


All the tests are done using an Ubuntu System, as we are not very lucky to find similar features on macOS/Windows, so if you know anything, please help us out! Additionally, we also want to gather more data, so if you know
any other large packages that make is often tested with, please let us know
as well!

1) https://www.yoctoproject.org/
2) https://docs.kernel.org/accounting/psi.html

Thanks!

Randy and Zheng





--
# Randy MacLeod
# Wind River Linux





reply via email to

[Prev in Thread] Current Thread [Next in Thread]