bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#58290: guile ssh error on guix deploy


From: Marius Bakke
Subject: bug#58290: guile ssh error on guix deploy
Date: Fri, 04 Nov 2022 00:47:02 +0100

Ludovic Courtès <ludo@gnu.org> skriver:

> Hi,
>
> Marius Bakke <marius@gnu.org> skribis:
>
>> I have the same problem.  Log messages around failure look something
>> like this (extracted from above message):
>>
>>   Oct  4 11:51:49 localhost shepherd[1]: Service sshd-2 has been started. 
>>   Oct  4 11:51:50 localhost sshd[250]: Accepted publickey for bob from 
>> 185.70.53.164 port 1915 ssh2: RSA 
>> SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
>>   Oct  4 11:52:28 localhost sshd[252]: error: no more sessions
>>   Oct  4 11:52:28 localhost shepherd[1]: 1 connection still in use after 
>> sshd-2 termination. 
>>   Oct  4 11:52:28 localhost shepherd[1]: Service sshd-2 has been disabled. 
>>
>> Then deploy crashes with 'Channel opening failure'.
>
> “no more sessions” comes from this:
>
> --8<---------------cut here---------------start------------->8---
> int
> session_open(Authctxt *authctxt, int chanid)
> {
>       Session *s = session_new();
>       debug("session_open: channel %d", chanid);
>       if (s == NULL) {
>               error("no more sessions");
>               return 0;
>       }
>
> [...]
>
> }
> --8<---------------cut here---------------end--------------->8---
>
> Would be interesting to run sshd in verbose/debug mode and see why we
> hit that; it could be because the maximum number of sessions has been
> reached or something.

I enabled (log-level 'debug) and this is what happens server side:

Nov  3 21:48:25 capella sshd[26345]: debug1: permanently_set_uid: 0/0
Nov  3 21:50:27 capella sshd[26115]: debug1: Received SIGCHLD.
Nov  3 21:50:27 capella sshd[26115]: debug1: session_by_pid: pid 26345
Nov  3 21:50:27 capella sshd[26115]: debug1: session_exit_message: session 8 
channel 8 pid 26345
Nov  3 21:50:27 capella sshd[26115]: debug1: session_exit_message: release 
channel 8
Nov  3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: ctype 
session rchan 65 win 64000 max 32768
Nov  3 21:50:28 capella sshd[26115]: debug1: input_session_request
Nov  3 21:50:28 capella sshd[26115]: debug1: channel 0: new [server-session]
Nov  3 21:50:28 capella sshd[26115]: debug1: session_new: session 0
Nov  3 21:50:28 capella sshd[26115]: debug1: session_open: channel 0
Nov  3 21:50:28 capella sshd[26115]: debug1: session_open: session 0: link with 
channel 0
Nov  3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: confirm 
session
Nov  3 21:50:28 capella sshd[26115]: debug1: server_input_channel_req: channel 
0 request exec reply 1
Nov  3 21:50:28 capella sshd[26115]: debug1: session_by_channel: session 0 
channel 0
Nov  3 21:50:28 capella sshd[26115]: debug1: session_input_channel_req: session 
0 req exec
Nov  3 21:50:28 capella sshd[26535]: debug1: PAM: reinitializing credentials
Nov  3 21:50:28 capella sshd[26535]: debug1: permanently_set_uid: 0/0
Nov  3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: ctype 
session rchan 66 win 64000 max 32768
Nov  3 21:50:28 capella sshd[26115]: debug1: input_session_request
Nov  3 21:50:28 capella sshd[26115]: debug1: channel 10: new [server-session]
Nov  3 21:50:28 capella sshd[26115]: debug1: session_open: channel 10
["no more sessions" error occurs here, in a different log file]
Nov  3 21:50:28 capella sshd[26115]: debug1: session open failed, free channel 
10
Nov  3 21:50:28 capella sshd[26115]: debug1: channel 10: free: server-session, 
nchannels 11
Nov  3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: failure 
session
Nov  3 21:50:28 capella sshd[26115]: debug1: channel 0: free: server-session, 
nchannels 10
Nov  3 21:50:28 capella sshd[26115]: debug1: channel 1: free: server-session, 
nchannels 9
Nov  3 21:50:28 capella sshd[26115]: debug1: channel 2: free: server-session, 
nchannels 8
Nov  3 21:50:28 capella sshd[26115]: debug1: channel 3: free: server-session, 
nchannels 7
Nov  3 21:50:28 capella sshd[26115]: debug1: channel 4: free: server-session, 
nchannels 6
Nov  3 21:50:28 capella sshd[26115]: debug1: channel 5: free: server-session, 
nchannels 5
Nov  3 21:50:28 capella sshd[26115]: debug1: channel 6: free: server-session, 
nchannels 4
Nov  3 21:50:28 capella sshd[26115]: debug1: channel 7: free: server-session, 
nchannels 3
Nov  3 21:50:28 capella sshd[26115]: debug1: channel 8: free: server-session, 
nchannels 2
Nov  3 21:50:28 capella sshd[26115]: debug1: channel 9: free: server-session, 
nchannels 1
Nov  3 21:50:28 capella sshd[26115]: debug1: do_cleanup

I compared this with a successful deploy:

Nov  3 21:44:22 capella sshd[25842]: debug1: permanently_set_uid: 0/0
Nov  3 21:46:25 capella sshd[25612]: debug1: Received SIGCHLD.
Nov  3 21:46:25 capella sshd[25612]: debug1: session_by_pid: pid 25842
Nov  3 21:46:25 capella sshd[25612]: debug1: session_exit_message: session 6 
channel 6 pid 25842
Nov  3 21:46:25 capella sshd[25612]: debug1: session_exit_message: release 
channel 6
Nov  3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open: ctype 
session rchan 65 win 64000 max 32768
Nov  3 21:46:26 capella sshd[25612]: debug1: input_session_request
Nov  3 21:46:26 capella sshd[25612]: debug1: channel 0: new [server-session]
Nov  3 21:46:26 capella sshd[25612]: debug1: session_new: session 0
Nov  3 21:46:26 capella sshd[25612]: debug1: session_open: channel 0
Nov  3 21:46:26 capella sshd[25612]: debug1: session_open: session 0: link with 
channel 0
Nov  3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open: confirm 
session
Nov  3 21:46:26 capella sshd[25612]: debug1: server_input_channel_req: channel 
0 request exec reply 1
Nov  3 21:46:26 capella sshd[25612]: debug1: session_by_channel: session 0 
channel 0
Nov  3 21:46:26 capella sshd[25612]: debug1: session_input_channel_req: session 
0 req exec
Nov  3 21:46:26 capella sshd[26032]: debug1: PAM: reinitializing credentials
Nov  3 21:46:26 capella sshd[26032]: debug1: permanently_set_uid: 0/0
Nov  3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open: ctype 
session rchan 66 win 64000 max 32768
Nov  3 21:46:26 capella sshd[25612]: debug1: input_session_request
Nov  3 21:46:26 capella sshd[25612]: debug1: channel 8: new [server-session]
Nov  3 21:46:26 capella sshd[25612]: debug1: session_new: session 8
Nov  3 21:46:26 capella sshd[25612]: debug1: session_open: channel 8
Nov  3 21:46:26 capella sshd[25612]: debug1: session_open: session 8: link with 
channel 8
Nov  3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open: confirm 
session
Nov  3 21:46:26 capella sshd[25612]: debug1: server_input_channel_req: channel 
8 request exec reply 1
Nov  3 21:46:26 capella sshd[25612]: debug1: session_by_channel: session 8 
channel 8
Nov  3 21:46:26 capella sshd[25612]: debug1: session_input_channel_req: session 
8 req exec
Nov  3 21:46:26 capella sshd[26043]: debug1: PAM: reinitializing credentials
Nov  3 21:46:26 capella sshd[26043]: debug1: permanently_set_uid: 0/0
Nov  3 21:46:28 capella sshd[25612]: debug1: chan_shutdown_extended_read: 
channel 8: (i0 o3 sock -1 wfd -1 efd 13 [read])
Nov  3 21:46:28 capella sshd[25612]: debug1: server_input_channel_open: ctype 
session rchan 67 win 64000 max 32768
Nov  3 21:46:28 capella sshd[25612]: debug1: input_session_request
Nov  3 21:46:28 capella sshd[25612]: debug1: channel 9: new [server-session]
Nov  3 21:46:28 capella sshd[25612]: debug1: session_new: session 9
[... more channels and stuffs before graceful shutdown ...]

I believe the two-minute window before SIGCHLD is waiting for some
Shepherd services to time out (long story..!).

In my current tests the failure always occur at the
very end of deploy:

building 
/gnu/store/19yajyzw4jhnkkz9w0l9gm4az0jxihkc-install-bootloader.scm.drv...
building /gnu/store/m7ngq5gszyswm6sficinz7ghpra30dl4-remote-exp.scm.drv...
;;; [2022/11/03 21:50:28.892606, 0] [GSSH ERROR] Channel opening failure: 
channel 66 error (2) open failed: #<input-output: channel (closed) 7fdefb015e80>

What's interesting is that all of the failed deploys fail to open
"channel 10", whereas successful deploys never reach it.  All of these
are deploying the exact same configuration and commit, so the SSH
traffic should nearly identical.

I have attached the transcripts of each session (by grepping the PIDs
from /var/log/debug).  Grepping for session_open yields some
inconstencies, but haven't studied them in detail yet.

>> Due to the number of SSH connections made by deploy and frequent
>> occurence of this problem, I was not able to successfully deploy through
>> many attempts.
>
> Ouch.  Normally, thanks to memoization, ‘guix deploy’ opens only one
> session per machine.  (I think it works well with the 25 local build
> machines behind berlin; it’s also fine for a small set of machines I
> take care of at work.)

Have they been rebooted since the switch to inetd-style sshd?  It fails
more often than not for me.

>> I found that removing the memoizing open-machine-ssh-session* in
>> (gnu machine ssh) works around it and can happily deploy again.
>>
>> (that is, just use 'open-machine-ssh-session' instead)
>
> Interesting.  That gives us a hint.  Could you add a ‘pk’ in
> ‘open-machine-ssh-session*’ to see how many connections it opens?

Indeed it only opens a single connection.  The problem seems to be
with multiple "channels" and "sessions" over a single connection.

Attachment: deploy-fail.log.gz
Description: application/gzip

Attachment: deploy-success.log.gz
Description: application/gzip

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]