bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#34033: Offloading sometimes hangs


From: Ludovic Courtès
Subject: bug#34033: Offloading sometimes hangs
Date: Fri, 03 Jul 2020 15:58:55 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)

Hi!

Mathieu Othacehe <othacehe@gnu.org> skribis:

>> Something is going wrong here! I'll keep investigating.
>
> To help us investigate those issues I added a "/status" page, which is
> also accessible from a new drop-down menu in the Cuirass navigation bar.
>
> See, https://ci.guix.gnu.org/status.

Nice!  So it’s roughly like the info at /api/queue, but filtered to
running builds, right?

> Hydra has the same interface, but also a "Machine status" page, that
> breaks down the running builds machine per machine. I plan to implement
> that one next. Reading Hydra code, I also discovered that some part of
> the offloading is directly done from Hydra, which talks with the
> nix-daemon of the connected build machines, interesting!

Yes, Hydra does most of the scheduling by itself.  Since this is
redundant with what the daemon + offload do, I thought Cuirass shouldn’t
do any scheduling at all and instead let the daemon take care of it
all.

This has advantages (the daemon has a global view and can achieve better
scheduling), and drawbacks (the protocol requires us to wait for
‘build-things’ completion before we can queue more builds, and
scheduling decisions are almost invisible to Cuirass).

> While I'm writing, we have 5 running builds for ~1 hour, and 76040 queued
> builds. Given the computing power of Berlin, there must be a bottleneck
> somewhere.

Yes!  I’ve often run “guix processes” on berlin, then stracing the
‘SessionPID’ process.  It’s insightful because you sometimes see the
daemon is stuck waiting for a machine to offload to, sometimes it’s
stuck waiting for a build that will perhaps just eventually timeout…

Ludo’.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]