bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#58290: guile ssh error on guix deploy


From: Andrew Tropin
Subject: bug#58290: guile ssh error on guix deploy
Date: Thu, 10 Nov 2022 14:06:33 +0400

On 2022-11-04 00:47, Marius Bakke wrote:

> Ludovic Courtès <ludo@gnu.org> skriver:
>
>> Hi,
>>
>> Marius Bakke <marius@gnu.org> skribis:
>>
>>> I have the same problem.  Log messages around failure look something
>>> like this (extracted from above message):
>>>
>>>   Oct  4 11:51:49 localhost shepherd[1]: Service sshd-2 has been started. 
>>>   Oct  4 11:51:50 localhost sshd[250]: Accepted publickey for bob from 
>>> 185.70.53.164 port 1915 ssh2: RSA 
>>> SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
>>>   Oct  4 11:52:28 localhost sshd[252]: error: no more sessions
>>>   Oct  4 11:52:28 localhost shepherd[1]: 1 connection still in use after 
>>> sshd-2 termination. 
>>>   Oct  4 11:52:28 localhost shepherd[1]: Service sshd-2 has been disabled. 
>>>
>>> Then deploy crashes with 'Channel opening failure'.
>>
>> “no more sessions” comes from this:
>>
>> --8<---------------cut here---------------start------------->8---
>> int
>> session_open(Authctxt *authctxt, int chanid)
>> {
>>      Session *s = session_new();
>>      debug("session_open: channel %d", chanid);
>>      if (s == NULL) {
>>              error("no more sessions");
>>              return 0;
>>      }
>>
>> [...]
>>
>> }
>> --8<---------------cut here---------------end--------------->8---
>>
>> Would be interesting to run sshd in verbose/debug mode and see why we
>> hit that; it could be because the maximum number of sessions has been
>> reached or something.
>
> I enabled (log-level 'debug) and this is what happens server side:
>
> Nov  3 21:48:25 capella sshd[26345]: debug1: permanently_set_uid: 0/0
> Nov  3 21:50:27 capella sshd[26115]: debug1: Received SIGCHLD.
> Nov  3 21:50:27 capella sshd[26115]: debug1: session_by_pid: pid 26345
> Nov  3 21:50:27 capella sshd[26115]: debug1: session_exit_message: session 8 
> channel 8 pid 26345
> Nov  3 21:50:27 capella sshd[26115]: debug1: session_exit_message: release 
> channel 8
> Nov  3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: ctype 
> session rchan 65 win 64000 max 32768
> Nov  3 21:50:28 capella sshd[26115]: debug1: input_session_request
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 0: new [server-session]
> Nov  3 21:50:28 capella sshd[26115]: debug1: session_new: session 0
> Nov  3 21:50:28 capella sshd[26115]: debug1: session_open: channel 0
> Nov  3 21:50:28 capella sshd[26115]: debug1: session_open: session 0: link 
> with channel 0
> Nov  3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: 
> confirm session
> Nov  3 21:50:28 capella sshd[26115]: debug1: server_input_channel_req: 
> channel 0 request exec reply 1
> Nov  3 21:50:28 capella sshd[26115]: debug1: session_by_channel: session 0 
> channel 0
> Nov  3 21:50:28 capella sshd[26115]: debug1: session_input_channel_req: 
> session 0 req exec
> Nov  3 21:50:28 capella sshd[26535]: debug1: PAM: reinitializing credentials
> Nov  3 21:50:28 capella sshd[26535]: debug1: permanently_set_uid: 0/0
> Nov  3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: ctype 
> session rchan 66 win 64000 max 32768
> Nov  3 21:50:28 capella sshd[26115]: debug1: input_session_request
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 10: new [server-session]
> Nov  3 21:50:28 capella sshd[26115]: debug1: session_open: channel 10
> ["no more sessions" error occurs here, in a different log file]
> Nov  3 21:50:28 capella sshd[26115]: debug1: session open failed, free 
> channel 10
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 10: free: 
> server-session, nchannels 11
> Nov  3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: 
> failure session
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 0: free: server-session, 
> nchannels 10
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 1: free: server-session, 
> nchannels 9
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 2: free: server-session, 
> nchannels 8
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 3: free: server-session, 
> nchannels 7
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 4: free: server-session, 
> nchannels 6
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 5: free: server-session, 
> nchannels 5
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 6: free: server-session, 
> nchannels 4
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 7: free: server-session, 
> nchannels 3
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 8: free: server-session, 
> nchannels 2
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 9: free: server-session, 
> nchannels 1
> Nov  3 21:50:28 capella sshd[26115]: debug1: do_cleanup
>
> I compared this with a successful deploy:
>
> Nov  3 21:44:22 capella sshd[25842]: debug1: permanently_set_uid: 0/0
> Nov  3 21:46:25 capella sshd[25612]: debug1: Received SIGCHLD.
> Nov  3 21:46:25 capella sshd[25612]: debug1: session_by_pid: pid 25842
> Nov  3 21:46:25 capella sshd[25612]: debug1: session_exit_message: session 6 
> channel 6 pid 25842
> Nov  3 21:46:25 capella sshd[25612]: debug1: session_exit_message: release 
> channel 6
> Nov  3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open: ctype 
> session rchan 65 win 64000 max 32768
> Nov  3 21:46:26 capella sshd[25612]: debug1: input_session_request
> Nov  3 21:46:26 capella sshd[25612]: debug1: channel 0: new [server-session]
> Nov  3 21:46:26 capella sshd[25612]: debug1: session_new: session 0
> Nov  3 21:46:26 capella sshd[25612]: debug1: session_open: channel 0
> Nov  3 21:46:26 capella sshd[25612]: debug1: session_open: session 0: link 
> with channel 0
> Nov  3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open: 
> confirm session
> Nov  3 21:46:26 capella sshd[25612]: debug1: server_input_channel_req: 
> channel 0 request exec reply 1
> Nov  3 21:46:26 capella sshd[25612]: debug1: session_by_channel: session 0 
> channel 0
> Nov  3 21:46:26 capella sshd[25612]: debug1: session_input_channel_req: 
> session 0 req exec
> Nov  3 21:46:26 capella sshd[26032]: debug1: PAM: reinitializing credentials
> Nov  3 21:46:26 capella sshd[26032]: debug1: permanently_set_uid: 0/0
> Nov  3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open: ctype 
> session rchan 66 win 64000 max 32768
> Nov  3 21:46:26 capella sshd[25612]: debug1: input_session_request
> Nov  3 21:46:26 capella sshd[25612]: debug1: channel 8: new [server-session]
> Nov  3 21:46:26 capella sshd[25612]: debug1: session_new: session 8
> Nov  3 21:46:26 capella sshd[25612]: debug1: session_open: channel 8
> Nov  3 21:46:26 capella sshd[25612]: debug1: session_open: session 8: link 
> with channel 8
> Nov  3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open: 
> confirm session
> Nov  3 21:46:26 capella sshd[25612]: debug1: server_input_channel_req: 
> channel 8 request exec reply 1
> Nov  3 21:46:26 capella sshd[25612]: debug1: session_by_channel: session 8 
> channel 8
> Nov  3 21:46:26 capella sshd[25612]: debug1: session_input_channel_req: 
> session 8 req exec
> Nov  3 21:46:26 capella sshd[26043]: debug1: PAM: reinitializing credentials
> Nov  3 21:46:26 capella sshd[26043]: debug1: permanently_set_uid: 0/0
> Nov  3 21:46:28 capella sshd[25612]: debug1: chan_shutdown_extended_read: 
> channel 8: (i0 o3 sock -1 wfd -1 efd 13 [read])
> Nov  3 21:46:28 capella sshd[25612]: debug1: server_input_channel_open: ctype 
> session rchan 67 win 64000 max 32768
> Nov  3 21:46:28 capella sshd[25612]: debug1: input_session_request
> Nov  3 21:46:28 capella sshd[25612]: debug1: channel 9: new [server-session]
> Nov  3 21:46:28 capella sshd[25612]: debug1: session_new: session 9
> [... more channels and stuffs before graceful shutdown ...]
>
> I believe the two-minute window before SIGCHLD is waiting for some
> Shepherd services to time out (long story..!).
>
> In my current tests the failure always occur at the
> very end of deploy:
>
> building 
> /gnu/store/19yajyzw4jhnkkz9w0l9gm4az0jxihkc-install-bootloader.scm.drv...
> building /gnu/store/m7ngq5gszyswm6sficinz7ghpra30dl4-remote-exp.scm.drv...
> ;;; [2022/11/03 21:50:28.892606, 0] [GSSH ERROR] Channel opening failure: 
> channel 66 error (2) open failed: #<input-output: channel (closed) 
> 7fdefb015e80>
>
> What's interesting is that all of the failed deploys fail to open
> "channel 10", whereas successful deploys never reach it.  All of these
> are deploying the exact same configuration and commit, so the SSH
> traffic should nearly identical.
>
> I have attached the transcripts of each session (by grepping the PIDs
> from /var/log/debug).  Grepping for session_open yields some
> inconstencies, but haven't studied them in detail yet.
>
>>> Due to the number of SSH connections made by deploy and frequent
>>> occurence of this problem, I was not able to successfully deploy through
>>> many attempts.
>>
>> Ouch.  Normally, thanks to memoization, ‘guix deploy’ opens only one
>> session per machine.  (I think it works well with the 25 local build
>> machines behind berlin; it’s also fine for a small set of machines I
>> take care of at work.)
>
> Have they been rebooted since the switch to inetd-style sshd?  It fails
> more often than not for me.
>
>>> I found that removing the memoizing open-machine-ssh-session* in
>>> (gnu machine ssh) works around it and can happily deploy again.
>>>
>>> (that is, just use 'open-machine-ssh-session' instead)
>>
>> Interesting.  That gives us a hint.  Could you add a ‘pk’ in
>> ‘open-machine-ssh-session*’ to see how many connections it opens?
>
> Indeed it only opens a single connection.  The problem seems to be
> with multiple "channels" and "sessions" over a single connection.
>

Very detailed report, thank you for digging in!

CCed Artyom Poptsov, author of Guile-SSH.

-- 
Best regards,
Andrew Tropin

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]