bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#41625: [PATCH v2] offload: Handle a possible EOF response from read-


From: Maxim Cournoyer
Subject: bug#41625: [PATCH v2] offload: Handle a possible EOF response from read-repl-response.
Date: Thu, 27 May 2021 07:49:22 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)

Hi Ludovic,

Ludovic Courtès <ludo@gnu.org> writes:

[...]

> I see.  So I’d say it’s a prerequisite (a patch that must come before)
> but not entirely the same thing.  I’m nitpicking!

Eh, it's okay :-).  Splitting changes into the right unit is a problem
that is akin to naming things; it's hard!  I welcome your suggestion.

> We should make sure it doesn’t trigger thread-safety issues in libssh or
> anything like that (running it repeatedly on a large machines.scm should
> give us some confidence).

It seems fine so far, but I've only tested in a loop with 4 build
machines.  When it nears completion I'll give it a shot on berlin.

[...]

> Yes, but note that this is just for ‘guix offload test’.  The actual
> code run while offloading will still fail badly.

Ah, thanks for pointing that; I somehow thought that this machine status
checking code was a prelude to every offloaded build.

[...]

>> I don't have a password set for my user on overdrive1, so can't attach
>> strace to sshd, but yeah, we could try to capture it and see if we can
>> understand what's going on.
>
> OK.

I'd be happy to try strace when your are available.  You can ping me on
the chat.  It's been more than 8 hours since I tried, so I should be
able to trigger the problem :-).

[...]

> Perhaps worth adding an ‘inferior’ and/or ‘port’ field.  That would
> allow the handler to present more information as to which inferior is
> failing.
>
> Maybe ‘premature-eof’ would be more accurate than ‘connection-lost’.

Good suggestions.  I'll implement them.

>> +                       (format (current-error-port)
>> +                               (G_ "connection to machine '~a' lost; 
>> retrying~%")
>> +                               (build-machine-name machine))
>
> You can use ‘info’ instead of ‘format’.

That also.  Thanks!

On another note, I was able to 'exercise' the fix, and the exception is
raised but something fails with the following backtrace instead of being
retried:

--8<---------------cut here---------------start------------->8---
guix offload: Testing 1 build machines defined in '/etc/guix/machines.scm'...
connection to machine 'overdrive1.guix.gnu.org' lost; retrying
Backtrace:
In ice-9/boot-9.scm:
  1752:10 10 (with-exception-handler _ _ #:unwind? _ #:unwind-for-type _)
In unknown file:
           9 (apply-smob/0 #<thunk 7f915c028f60>)
In ice-9/boot-9.scm:
    724:2  8 (call-with-prompt _ _ #<procedure default-prompt-handler (k proc)>)
In ice-9/eval.scm:
    619:8  7 (_ #(#(#<directory (guile-user) 7f915c022c80>)))
In guix/ui.scm:
  2161:12  6 (run-guix-command _ . _)
In ice-9/boot-9.scm:
  1752:10  5 (with-exception-handler _ _ #:unwind? _ #:unwind-for-type _)
  1747:15  4 (with-exception-handler #<procedure 7f91576bf0c0 at 
ice-9/boot-9.scm:1831:7 (exn)> _ # _ # …)
In srfi/srfi-1.scm:
    634:9  3 (for-each #<procedure check-machine-availability (a)> 
(#<<build-machine> name: "overdriv…>))
In ice-9/eval.scm:
   191:35  2 (_ #(#(#(#<directory (guix scripts offload) 7f9159852780> 3 
#<<build-machine> na…> …) …) …))
Exception thrown while printing backtrace:
In procedure frame-local-ref: Argument 2 out of range: 1

ice-9/boot-9.scm:1685:16: In procedure raise-exception:
Wrong type to apply: 2
--8<---------------cut here---------------end--------------->8---

I haven't been able to pinpoint what yet.  Notice that in the above code
I've changed par-for-each by just for-each, doubting it might have
something to do with it, but it appears unrelated.

Thanks,

Maxim





reply via email to

[Prev in Thread] Current Thread [Next in Thread]