guix-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[no subject]


From: Mathieu Othacehe
Date: Sun, 20 Nov 2022 12:32:59 -0500 (EST)

branch: master
commit fc1641381d2a8a0472a71ef5ad2b64361faaaab4
Author: Mathieu Othacehe <othacehe@gnu.org>
AuthorDate: Sun Nov 20 18:21:42 2022 +0100

    remote-worker: Prevent a dead-hang on server disconnection.
    
    This is a follow-up of 1fb4b0ac1297e9bd680d0f4a356ce3050b27f913 that tried 
to
    work around the remote-worker hangs by introducing a non-blocking read.
    
    This solution was problematic because when the server is unresponsive, the
    request-work requests are queued on the worker. When the server is back
    online, the requests were all sent to server.
    
    Use instead the ZMQ_PROBE_ROUTER option that causes the server to send an
    empty boostrap message to the worker when a connection is established. This
    empty message will unlock the workers that were hanging on the request-work
    response.
    
    * src/cuirass/scripts/remote-server.scm (zmq-start-proxy): Set the
    ZMQ_PROBE_ROUTER option on the build socket.
    * src/cuirass/scripts/remote-worker.scm (start-worker): Ignore the bootstrap
    message when reading server info however, when receiving a bootstrap message
    while waiting for a request-work response, keep going.
---
 src/cuirass/scripts/remote-server.scm | 4 ++++
 src/cuirass/scripts/remote-worker.scm | 7 +++++++
 2 files changed, 11 insertions(+)

diff --git a/src/cuirass/scripts/remote-server.scm 
b/src/cuirass/scripts/remote-server.scm
index 8843a95..c168318 100644
--- a/src/cuirass/scripts/remote-server.scm
+++ b/src/cuirass/scripts/remote-server.scm
@@ -469,6 +469,10 @@ frontend to the workers connected through the TCP backend."
          (poll-items (list
                       (poll-item build-socket ZMQ_POLLIN))))
 
+    ;; Send bootstrap messages on worker connection to wake up the workers
+    ;; that were hanging waiting for request-work responses.
+    (zmq-set-socket-option build-socket ZMQ_PROBE_ROUTER 1)
+
     (zmq-bind-socket build-socket (zmq-backend-endpoint backend-port))
     (zmq-bind-socket fetch-socket (zmq-fetch-workers-endpoint))
 
diff --git a/src/cuirass/scripts/remote-worker.scm 
b/src/cuirass/scripts/remote-worker.scm
index af1eb2d..37c8afe 100644
--- a/src/cuirass/scripts/remote-worker.scm
+++ b/src/cuirass/scripts/remote-worker.scm
@@ -329,6 +329,10 @@ and executing them.  The worker can reply on the same 
socket."
            (string->bv (zmq-worker-request-info-message)))))
 
   (define (read-server-info socket)
+    ;; Ignore the boostrap message sent due to ZMQ_PROBE_ROUTER option.
+    (match (zmq-get-msg-parts-bytevector socket '())
+      ((empty) #f))
+
     (request-info socket)
     (match (zmq-get-msg-parts-bytevector socket '())
       ((empty info)
@@ -379,6 +383,9 @@ and executing them.  The worker can reply on the same 
socket."
                  (log-info (G_ "~a: request work.") (worker-name wrk))
                  (request-work socket worker)
                  (match (zmq-get-msg-parts-bytevector socket '())
+                   ((empty)
+                    (log-info (G_ "~a: received a bootstrap message.")
+                              (worker-name wrk)))
                    ((empty command)
                     (run-command (bv->string command) server
                                  #:reply (reply socket)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]