bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#40665: 28.0.50; tls hang on local ssl


From: Robert Pluim
Subject: bug#40665: 28.0.50; tls hang on local ssl
Date: Sun, 19 Apr 2020 16:34:38 +0200

>>>>> On Sat, 18 Apr 2020 02:44:05 +0000 (UTC), Derek Zhou <derek@3qin.us> said:

    Derek> Derek Zhou writes:

    >> When this thing happens, the tls handshakes are done properly. However,
    >> emacs did not write anything into gnutls before starting to read and
    >> obviously cannot get anything out at all. It is not really a hang, but
    >> write never happen and the display buffer stays empty.
    >> 
    >> Derek

    Derek> Took my nearly the whole day to debug, but this one-line patch fixed 
my
    Derek> problem.
    Derek> My server finishes tls handshake within the gnutls_boot itself, and 
if the
    Derek> sentinel is not called right after, it will never be called so write
    Derek> will not happen. Someone should review this carefully.

    Derek> diff --git a/src/process.c b/src/process.c
    Derek> index 91d426103d..6d497ef854 100644
    Derek> --- a/src/process.c
    Derek> +++ b/src/process.c
    Derek> @@ -5937,8 +5937,7 @@ wait_reading_process_output (intmax_t 
time_limit, int nsecs, int read_kbd,
    Derek>                /* If we have an incompletely set up TLS connection,
    Derek>                   then defer the sentinel signaling until
    Derek>                   later. */
    Derek> -              if (NILP (p->gnutls_boot_parameters)
    Derek> -                  && !p->gnutls_p)
    Derek> +              if (NILP (p->gnutls_boot_parameters))
    Derek>  #endif
    Derek>                  {
    Derek>                    pset_status (p, Qrun);

Hereʼs what I think is happening:

The only way for p->gnutls_boot_parameters to become nil is here in
connect_network_socket:

      if (p->gnutls_initstage == GNUTLS_STAGE_READY)
        {
          p->gnutls_boot_parameters = Qnil;
          /* Run sentinels, etc. */
          finish_after_tls_connection (proc);
        }

and finish_after_tls_connection should call the sentinel, but
NON_BLOCKING_CONNECT_FD is still set, so it doesnʼt.

The next chance to call the sentinel would be from
wait_reading_process_output, but only if handshaking has been tried
and not completed, except it is complete already.

wait_reading_process_output then calls delete_write_fd, which clears
NON_BLOCKING_CONNECT_FD, and doesnʼt run the sentinel because
p->gnutls_boot_parameters is nil and p->gnutls_p is true

finish_after_tls_connection never gets another chance to run, since
the socket is connected and handshaking is complete.

After your change, you've fixed this case:

    if p->gnutls_boot_parameters is nil, that means the handshake
    completed already and the TLS connection is up, so
    calling the sentinel is ok.

In other cases where the handshake does not complete straight away in
Fgnutls_boot, it will complete here in wait_reading_process_output

                /* Continue TLS negotiation. */
                if (p->gnutls_initstage == GNUTLS_STAGE_HANDSHAKE_TRIED
                    && p->is_non_blocking_client)
                  {
                    gnutls_try_handshake (p);
                    p->gnutls_handshakes_tried++;

                    if (p->gnutls_initstage == GNUTLS_STAGE_READY)
                      {
                        gnutls_verify_boot (aproc, Qnil);
                        finish_after_tls_connection (aproc);
                      }

which always happens after delete_write_fd has been called, which
clears NON_BLOCKING_CONNECT_FD, so finish_after_tls_connection calls
the sentinel.

One change we could make is to set p->gnutls_boot_parameters to nil
here, so that in the sequence

    Fgnutls_boot, handshake does not complete
    handshake succeeds first time in wait_reading_process_output
    delete_write_fd then checks p->gnutls_boot_parameters

the sentinel ends up getting run, but Iʼve not seen the handshake ever
succeed straight away before the delete_write_fd, and if it ever has
in the wild we would have seen bug reports (and this is dragon-filled
code, so I donʼt want to make changes to it if I can help it :-))

In short: I think the change is ok. It passes the network-stream
tests, so Iʼll run with it for a while, and push it in a week or so.

Robert





reply via email to

[Prev in Thread] Current Thread [Next in Thread]