[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#34033: Offloading sometimes hangs
From: |
Ludovic Courtès |
Subject: |
bug#34033: Offloading sometimes hangs |
Date: |
Fri, 03 Jul 2020 15:58:55 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) |
Hi!
Mathieu Othacehe <othacehe@gnu.org> skribis:
>> Something is going wrong here! I'll keep investigating.
>
> To help us investigate those issues I added a "/status" page, which is
> also accessible from a new drop-down menu in the Cuirass navigation bar.
>
> See, https://ci.guix.gnu.org/status.
Nice! So it’s roughly like the info at /api/queue, but filtered to
running builds, right?
> Hydra has the same interface, but also a "Machine status" page, that
> breaks down the running builds machine per machine. I plan to implement
> that one next. Reading Hydra code, I also discovered that some part of
> the offloading is directly done from Hydra, which talks with the
> nix-daemon of the connected build machines, interesting!
Yes, Hydra does most of the scheduling by itself. Since this is
redundant with what the daemon + offload do, I thought Cuirass shouldn’t
do any scheduling at all and instead let the daemon take care of it
all.
This has advantages (the daemon has a global view and can achieve better
scheduling), and drawbacks (the protocol requires us to wait for
‘build-things’ completion before we can queue more builds, and
scheduling decisions are almost invisible to Cuirass).
> While I'm writing, we have 5 running builds for ~1 hour, and 76040 queued
> builds. Given the computing power of Berlin, there must be a bottleneck
> somewhere.
Yes! I’ve often run “guix processes” on berlin, then stracing the
‘SessionPID’ process. It’s insightful because you sometimes see the
daemon is stuck waiting for a machine to offload to, sometimes it’s
stuck waiting for a build that will perhaps just eventually timeout…
Ludo’.