bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#41625: Sporadic guix-offload crashes due to EOF errors


From: Maxim Cournoyer
Subject: bug#41625: Sporadic guix-offload crashes due to EOF errors
Date: Mon, 24 May 2021 01:33:21 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)

Hi,

Ludovic Courtès <ludo@gnu.org> writes:

> Hi,
>
> Marius Bakke <marius@gnu.org> skribis:
>
>> Marius Bakke <marius@gnu.org> writes:
>>
>>> 'guix offload test' passes without problems.
>>
>> Not so fast, running it in a loop reveals the crash.
>>
>> There is a trace file in /root/offloadtest.trace on Berlin with such an
>> occurence.  It looks like a timeout is reached shortly before the EOF
>> error:
>>
>> 10139 poll([{fd=14, events=POLLIN|POLLOUT}], 1, 0) = 1 ([{fd=14, 
>> revents=POLLOUT}])
>> 10139 poll([{fd=14, events=POLLIN}], 1, 15000) = 0 (Timeout)
>> 10139 write(2, "Backtrace:\n", 11)      = 11
>>
>> This seems to be from a different node than the one reported previously,
>> as the preceding connect() was to this machine:
>>
>> 10139 connect(44, {sa_family=AF_INET, sin_port=htons(22),
>> sin_addr=inet_addr("141.80.167.186")}, 16) = -1 EINPROGRESS
>> (Operation now in progress)
>
> So it looks like ‘connect’ fails and eventually we get an EOF object.
> However, I don’t see where that EOF comes from because the return value
> of ‘connect!’ (the Guile-SSH procedure) is properly checked.
>
> Ludo’.

I got a slightly different backtrace that suggests making the connection
is not at fault, rather it occurs during the read-repl-response call:

--8<---------------cut here---------------start------------->8---
guix offload: testing 1 build machines defined in '/etc/guix/machines.scm'...
Backtrace:
           8 (primitive-load "/home/maxim/.config/guix/current/bin/guix")
In guix/ui.scm:
  2165:12  7 (run-guix-command _ . _)
In ice-9/boot-9.scm:
  1752:10  6 (with-exception-handler _ _ #:unwind? _ #:unwind-for-type _)
  1747:15  5 (with-exception-handler #<procedure 7f2caf885780 at 
ice-9/boot-9.scm:1831:7 (exn)> _ # _ # …)
In guix/scripts/offload.scm:
   704:21  4 (check-machine-availability _ _)
In srfi/srfi-1.scm:
   586:17  3 (map1 (#<session maxim@overdrive1.guix.gnu.org:52522 (connected) 
7f2cae396fc0>))
In guix/inferior.scm:
    258:2  2 (port->inferior _ _)
    240:2  1 (read-repl-response _ _)
In ice-9/boot-9.scm:
  1685:16  0 (raise-exception _ #:continuable? _)

ice-9/boot-9.scm:1685:16: In procedure raise-exception:
Throw to key `match-error' with args `("match" "no matching pattern" #<eof>)'.
--8<---------------cut here---------------end--------------->8---

I seem to get this more often than not with the overdrive1 offload
machine.

Maxim





reply via email to

[Prev in Thread] Current Thread [Next in Thread]