[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#41625: [PATCH v2] offload: Handle a possible EOF response from read-
From: |
Maxim Cournoyer |
Subject: |
bug#41625: [PATCH v2] offload: Handle a possible EOF response from read-repl-response. |
Date: |
Thu, 27 May 2021 07:49:22 -0400 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) |
Hi Ludovic,
Ludovic Courtès <ludo@gnu.org> writes:
[...]
> I see. So I’d say it’s a prerequisite (a patch that must come before)
> but not entirely the same thing. I’m nitpicking!
Eh, it's okay :-). Splitting changes into the right unit is a problem
that is akin to naming things; it's hard! I welcome your suggestion.
> We should make sure it doesn’t trigger thread-safety issues in libssh or
> anything like that (running it repeatedly on a large machines.scm should
> give us some confidence).
It seems fine so far, but I've only tested in a loop with 4 build
machines. When it nears completion I'll give it a shot on berlin.
[...]
> Yes, but note that this is just for ‘guix offload test’. The actual
> code run while offloading will still fail badly.
Ah, thanks for pointing that; I somehow thought that this machine status
checking code was a prelude to every offloaded build.
[...]
>> I don't have a password set for my user on overdrive1, so can't attach
>> strace to sshd, but yeah, we could try to capture it and see if we can
>> understand what's going on.
>
> OK.
I'd be happy to try strace when your are available. You can ping me on
the chat. It's been more than 8 hours since I tried, so I should be
able to trigger the problem :-).
[...]
> Perhaps worth adding an ‘inferior’ and/or ‘port’ field. That would
> allow the handler to present more information as to which inferior is
> failing.
>
> Maybe ‘premature-eof’ would be more accurate than ‘connection-lost’.
Good suggestions. I'll implement them.
>> + (format (current-error-port)
>> + (G_ "connection to machine '~a' lost;
>> retrying~%")
>> + (build-machine-name machine))
>
> You can use ‘info’ instead of ‘format’.
That also. Thanks!
On another note, I was able to 'exercise' the fix, and the exception is
raised but something fails with the following backtrace instead of being
retried:
--8<---------------cut here---------------start------------->8---
guix offload: Testing 1 build machines defined in '/etc/guix/machines.scm'...
connection to machine 'overdrive1.guix.gnu.org' lost; retrying
Backtrace:
In ice-9/boot-9.scm:
1752:10 10 (with-exception-handler _ _ #:unwind? _ #:unwind-for-type _)
In unknown file:
9 (apply-smob/0 #<thunk 7f915c028f60>)
In ice-9/boot-9.scm:
724:2 8 (call-with-prompt _ _ #<procedure default-prompt-handler (k proc)>)
In ice-9/eval.scm:
619:8 7 (_ #(#(#<directory (guile-user) 7f915c022c80>)))
In guix/ui.scm:
2161:12 6 (run-guix-command _ . _)
In ice-9/boot-9.scm:
1752:10 5 (with-exception-handler _ _ #:unwind? _ #:unwind-for-type _)
1747:15 4 (with-exception-handler #<procedure 7f91576bf0c0 at
ice-9/boot-9.scm:1831:7 (exn)> _ # _ # …)
In srfi/srfi-1.scm:
634:9 3 (for-each #<procedure check-machine-availability (a)>
(#<<build-machine> name: "overdriv…>))
In ice-9/eval.scm:
191:35 2 (_ #(#(#(#<directory (guix scripts offload) 7f9159852780> 3
#<<build-machine> na…> …) …) …))
Exception thrown while printing backtrace:
In procedure frame-local-ref: Argument 2 out of range: 1
ice-9/boot-9.scm:1685:16: In procedure raise-exception:
Wrong type to apply: 2
--8<---------------cut here---------------end--------------->8---
I haven't been able to pinpoint what yet. Notice that in the above code
I've changed par-for-each by just for-each, doubting it might have
something to do with it, but it appears unrelated.
Thanks,
Maxim
- bug#41625: Sporadic guix-offload crashes due to EOF errors, Maxim Cournoyer, 2021/05/24
- bug#41625: [PATCH] offload: Handle a possible EOF response from read-repl-response., Maxim Cournoyer, 2021/05/25
- bug#41625: [PATCH] offload: Handle a possible EOF response from read-repl-response., Ludovic Courtès, 2021/05/25
- bug#41625: [PATCH v2] offload: Handle a possible EOF response from read-repl-response., Maxim Cournoyer, 2021/05/25
- bug#41625: [PATCH v2] offload: Handle a possible EOF response from read-repl-response., Ludovic Courtès, 2021/05/26
- bug#41625: [PATCH v2] offload: Handle a possible EOF response from read-repl-response.,
Maxim Cournoyer <=
- bug#41625: [PATCH v3] offload: Handle a possible EOF response from read-repl-response., Maxim Cournoyer, 2021/05/27
- bug#41625: [PATCH v2] offload: Handle a possible EOF response from read-repl-response., Maxim Cournoyer, 2021/05/27
- bug#41625: [PATCH v2] offload: Handle a possible EOF response from read-repl-response., Ludovic Courtès, 2021/05/29
- bug#41625: [PATCH v2] offload: Handle a possible EOF response from read-repl-response., Marius Bakke, 2021/05/26
- bug#41625: [PATCH v2] offload: Handle a possible EOF response from read-repl-response., Maxim Cournoyer, 2021/05/27