[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#58290: guile ssh error on guix deploy
From: |
Andrew Tropin |
Subject: |
bug#58290: guile ssh error on guix deploy |
Date: |
Thu, 10 Nov 2022 14:06:33 +0400 |
On 2022-11-04 00:47, Marius Bakke wrote:
> Ludovic Courtès <ludo@gnu.org> skriver:
>
>> Hi,
>>
>> Marius Bakke <marius@gnu.org> skribis:
>>
>>> I have the same problem. Log messages around failure look something
>>> like this (extracted from above message):
>>>
>>> Oct 4 11:51:49 localhost shepherd[1]: Service sshd-2 has been started.
>>> Oct 4 11:51:50 localhost sshd[250]: Accepted publickey for bob from
>>> 185.70.53.164 port 1915 ssh2: RSA
>>> SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
>>> Oct 4 11:52:28 localhost sshd[252]: error: no more sessions
>>> Oct 4 11:52:28 localhost shepherd[1]: 1 connection still in use after
>>> sshd-2 termination.
>>> Oct 4 11:52:28 localhost shepherd[1]: Service sshd-2 has been disabled.
>>>
>>> Then deploy crashes with 'Channel opening failure'.
>>
>> “no more sessions” comes from this:
>>
>> --8<---------------cut here---------------start------------->8---
>> int
>> session_open(Authctxt *authctxt, int chanid)
>> {
>> Session *s = session_new();
>> debug("session_open: channel %d", chanid);
>> if (s == NULL) {
>> error("no more sessions");
>> return 0;
>> }
>>
>> [...]
>>
>> }
>> --8<---------------cut here---------------end--------------->8---
>>
>> Would be interesting to run sshd in verbose/debug mode and see why we
>> hit that; it could be because the maximum number of sessions has been
>> reached or something.
>
> I enabled (log-level 'debug) and this is what happens server side:
>
> Nov 3 21:48:25 capella sshd[26345]: debug1: permanently_set_uid: 0/0
> Nov 3 21:50:27 capella sshd[26115]: debug1: Received SIGCHLD.
> Nov 3 21:50:27 capella sshd[26115]: debug1: session_by_pid: pid 26345
> Nov 3 21:50:27 capella sshd[26115]: debug1: session_exit_message: session 8
> channel 8 pid 26345
> Nov 3 21:50:27 capella sshd[26115]: debug1: session_exit_message: release
> channel 8
> Nov 3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: ctype
> session rchan 65 win 64000 max 32768
> Nov 3 21:50:28 capella sshd[26115]: debug1: input_session_request
> Nov 3 21:50:28 capella sshd[26115]: debug1: channel 0: new [server-session]
> Nov 3 21:50:28 capella sshd[26115]: debug1: session_new: session 0
> Nov 3 21:50:28 capella sshd[26115]: debug1: session_open: channel 0
> Nov 3 21:50:28 capella sshd[26115]: debug1: session_open: session 0: link
> with channel 0
> Nov 3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open:
> confirm session
> Nov 3 21:50:28 capella sshd[26115]: debug1: server_input_channel_req:
> channel 0 request exec reply 1
> Nov 3 21:50:28 capella sshd[26115]: debug1: session_by_channel: session 0
> channel 0
> Nov 3 21:50:28 capella sshd[26115]: debug1: session_input_channel_req:
> session 0 req exec
> Nov 3 21:50:28 capella sshd[26535]: debug1: PAM: reinitializing credentials
> Nov 3 21:50:28 capella sshd[26535]: debug1: permanently_set_uid: 0/0
> Nov 3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: ctype
> session rchan 66 win 64000 max 32768
> Nov 3 21:50:28 capella sshd[26115]: debug1: input_session_request
> Nov 3 21:50:28 capella sshd[26115]: debug1: channel 10: new [server-session]
> Nov 3 21:50:28 capella sshd[26115]: debug1: session_open: channel 10
> ["no more sessions" error occurs here, in a different log file]
> Nov 3 21:50:28 capella sshd[26115]: debug1: session open failed, free
> channel 10
> Nov 3 21:50:28 capella sshd[26115]: debug1: channel 10: free:
> server-session, nchannels 11
> Nov 3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open:
> failure session
> Nov 3 21:50:28 capella sshd[26115]: debug1: channel 0: free: server-session,
> nchannels 10
> Nov 3 21:50:28 capella sshd[26115]: debug1: channel 1: free: server-session,
> nchannels 9
> Nov 3 21:50:28 capella sshd[26115]: debug1: channel 2: free: server-session,
> nchannels 8
> Nov 3 21:50:28 capella sshd[26115]: debug1: channel 3: free: server-session,
> nchannels 7
> Nov 3 21:50:28 capella sshd[26115]: debug1: channel 4: free: server-session,
> nchannels 6
> Nov 3 21:50:28 capella sshd[26115]: debug1: channel 5: free: server-session,
> nchannels 5
> Nov 3 21:50:28 capella sshd[26115]: debug1: channel 6: free: server-session,
> nchannels 4
> Nov 3 21:50:28 capella sshd[26115]: debug1: channel 7: free: server-session,
> nchannels 3
> Nov 3 21:50:28 capella sshd[26115]: debug1: channel 8: free: server-session,
> nchannels 2
> Nov 3 21:50:28 capella sshd[26115]: debug1: channel 9: free: server-session,
> nchannels 1
> Nov 3 21:50:28 capella sshd[26115]: debug1: do_cleanup
>
> I compared this with a successful deploy:
>
> Nov 3 21:44:22 capella sshd[25842]: debug1: permanently_set_uid: 0/0
> Nov 3 21:46:25 capella sshd[25612]: debug1: Received SIGCHLD.
> Nov 3 21:46:25 capella sshd[25612]: debug1: session_by_pid: pid 25842
> Nov 3 21:46:25 capella sshd[25612]: debug1: session_exit_message: session 6
> channel 6 pid 25842
> Nov 3 21:46:25 capella sshd[25612]: debug1: session_exit_message: release
> channel 6
> Nov 3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open: ctype
> session rchan 65 win 64000 max 32768
> Nov 3 21:46:26 capella sshd[25612]: debug1: input_session_request
> Nov 3 21:46:26 capella sshd[25612]: debug1: channel 0: new [server-session]
> Nov 3 21:46:26 capella sshd[25612]: debug1: session_new: session 0
> Nov 3 21:46:26 capella sshd[25612]: debug1: session_open: channel 0
> Nov 3 21:46:26 capella sshd[25612]: debug1: session_open: session 0: link
> with channel 0
> Nov 3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open:
> confirm session
> Nov 3 21:46:26 capella sshd[25612]: debug1: server_input_channel_req:
> channel 0 request exec reply 1
> Nov 3 21:46:26 capella sshd[25612]: debug1: session_by_channel: session 0
> channel 0
> Nov 3 21:46:26 capella sshd[25612]: debug1: session_input_channel_req:
> session 0 req exec
> Nov 3 21:46:26 capella sshd[26032]: debug1: PAM: reinitializing credentials
> Nov 3 21:46:26 capella sshd[26032]: debug1: permanently_set_uid: 0/0
> Nov 3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open: ctype
> session rchan 66 win 64000 max 32768
> Nov 3 21:46:26 capella sshd[25612]: debug1: input_session_request
> Nov 3 21:46:26 capella sshd[25612]: debug1: channel 8: new [server-session]
> Nov 3 21:46:26 capella sshd[25612]: debug1: session_new: session 8
> Nov 3 21:46:26 capella sshd[25612]: debug1: session_open: channel 8
> Nov 3 21:46:26 capella sshd[25612]: debug1: session_open: session 8: link
> with channel 8
> Nov 3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open:
> confirm session
> Nov 3 21:46:26 capella sshd[25612]: debug1: server_input_channel_req:
> channel 8 request exec reply 1
> Nov 3 21:46:26 capella sshd[25612]: debug1: session_by_channel: session 8
> channel 8
> Nov 3 21:46:26 capella sshd[25612]: debug1: session_input_channel_req:
> session 8 req exec
> Nov 3 21:46:26 capella sshd[26043]: debug1: PAM: reinitializing credentials
> Nov 3 21:46:26 capella sshd[26043]: debug1: permanently_set_uid: 0/0
> Nov 3 21:46:28 capella sshd[25612]: debug1: chan_shutdown_extended_read:
> channel 8: (i0 o3 sock -1 wfd -1 efd 13 [read])
> Nov 3 21:46:28 capella sshd[25612]: debug1: server_input_channel_open: ctype
> session rchan 67 win 64000 max 32768
> Nov 3 21:46:28 capella sshd[25612]: debug1: input_session_request
> Nov 3 21:46:28 capella sshd[25612]: debug1: channel 9: new [server-session]
> Nov 3 21:46:28 capella sshd[25612]: debug1: session_new: session 9
> [... more channels and stuffs before graceful shutdown ...]
>
> I believe the two-minute window before SIGCHLD is waiting for some
> Shepherd services to time out (long story..!).
>
> In my current tests the failure always occur at the
> very end of deploy:
>
> building
> /gnu/store/19yajyzw4jhnkkz9w0l9gm4az0jxihkc-install-bootloader.scm.drv...
> building /gnu/store/m7ngq5gszyswm6sficinz7ghpra30dl4-remote-exp.scm.drv...
> ;;; [2022/11/03 21:50:28.892606, 0] [GSSH ERROR] Channel opening failure:
> channel 66 error (2) open failed: #<input-output: channel (closed)
> 7fdefb015e80>
>
> What's interesting is that all of the failed deploys fail to open
> "channel 10", whereas successful deploys never reach it. All of these
> are deploying the exact same configuration and commit, so the SSH
> traffic should nearly identical.
>
> I have attached the transcripts of each session (by grepping the PIDs
> from /var/log/debug). Grepping for session_open yields some
> inconstencies, but haven't studied them in detail yet.
>
>>> Due to the number of SSH connections made by deploy and frequent
>>> occurence of this problem, I was not able to successfully deploy through
>>> many attempts.
>>
>> Ouch. Normally, thanks to memoization, ‘guix deploy’ opens only one
>> session per machine. (I think it works well with the 25 local build
>> machines behind berlin; it’s also fine for a small set of machines I
>> take care of at work.)
>
> Have they been rebooted since the switch to inetd-style sshd? It fails
> more often than not for me.
>
>>> I found that removing the memoizing open-machine-ssh-session* in
>>> (gnu machine ssh) works around it and can happily deploy again.
>>>
>>> (that is, just use 'open-machine-ssh-session' instead)
>>
>> Interesting. That gives us a hint. Could you add a ‘pk’ in
>> ‘open-machine-ssh-session*’ to see how many connections it opens?
>
> Indeed it only opens a single connection. The problem seems to be
> with multiple "channels" and "sessions" over a single connection.
>
Very detailed report, thank you for digging in!
CCed Artyom Poptsov, author of Guile-SSH.
--
Best regards,
Andrew Tropin
signature.asc
Description: PGP signature