bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#48468: substitute server connection timeout


From: Mathieu Othacehe
Subject: bug#48468: substitute server connection timeout
Date: Wed, 28 Dec 2022 15:23:21 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)

Hello,

So I had an other look to that one,

> (From <https://ci.guix.gnu.org/build/224849/log/raw>.)
>
> Plausible explanations that come to mind:
>
>   1. ‘guix publish’ returning 404, but not due to baking.  Instead the
>      .drv is simply not in store, hence 404.
>
>   2. Client timeout (‘guix publish’ fails to reply on time).
>
>   3. Cached 404 in nginx (though we’re not supposed to cache those I
>      think?), or timeout in nginx (again due to ‘guix publish’ being too
>      slow and ‘proxy_read_timeout’ is reached, currently 10s).

I focused on https://ci.guix.gnu.org/build/308493/details which is
similar to the one linked above. Some derivations are successfully
substituted then, one is not and it aborts.

This build stopped at 2022/12/24 00:25:38 CET. At that time the nginx error
log on Berlin looks like:

--8<---------------cut here---------------start------------->8---
2022/12/24 00:25:23 [info] 128755#0: *178076209 recv() failed (104: Connection 
reset by peer) while sending to client, client: 78.47.68.4, server: 
ci.guix.gnu.org, request: "GET /x3c2f9zzxhk0ci81nrdsxzglp4b1h3di.narinfo 
HTTP/1.1", upstream: 
"http://127.0.0.1:3000/x3c2f9zzxhk0ci81nrdsxzglp4b1h3di.narinfo";, host: 
"ci.guix.gnu.org"
2022/12/24 00:25:24 [info] 128742#0: *178076816 client 174.59.221.51 closed 
keepalive connection
2022/12/24 00:25:25 [info] 128755#0: *178076830 recv() failed (104: Connection 
reset by peer) while sending to client, client: 78.47.68.4, server: 
ci.guix.gnu.org, request: "GET /vx7lgvrcw6i6r0inw513qn6fj3cb27mm.narinfo 
HTTP/1.1", upstream: 
"http://127.0.0.1:3000/vx7lgvrcw6i6r0inw513qn6fj3cb27mm.narinfo";, host: 
"ci.guix.gnu.org"
2022/12/24 00:25:26 [info] 128742#0: *178077436 recv() failed (104: Connection 
reset by peer) while sending to client, client: 78.47.68.4, server: 
ci.guix.gnu.org, request: "GET /b1g8vyqp8c52yk6vscdvjflci5mnladq.narinfo 
HTTP/1.1", upstream: 
"http://127.0.0.1:3000/b1g8vyqp8c52yk6vscdvjflci5mnladq.narinfo";, host: 
"ci.guix.gnu.org"
2022/12/24 00:25:28 [info] 128742#0: *178078045 recv() failed (104: Connection 
reset by peer) while sending to client, client: 78.47.68.4, server: 
ci.guix.gnu.org, request: "GET /007zgflsl5xkr377wpakbsis5c2yqh1q.narinfo 
HTTP/1.1", upstream: 
"http://127.0.0.1:3000/007zgflsl5xkr377wpakbsis5c2yqh1q.narinfo";, host: 
"ci.guix.gnu.org"
2022/12/24 00:25:32 [info] 128742#0: *178078659 recv() failed (104: Connection 
reset by peer) while sending to client, client: 78.47.68.4, server: 
ci.guix.gnu.org, request: "GET /xdfyi8jki350i8njk1glrzg9p0va18vy.narinfo 
HTTP/1.1", upstream: 
"http://127.0.0.1:3000/xdfyi8jki350i8njk1glrzg9p0va18vy.narinfo";, host: 
"ci.guix.gnu.org"
2022/12/24 00:25:33 [info] 128741#0: *178079271 recv() failed (104: Connection 
reset by peer) while sending to client, client: 78.47.68.4, server: 
ci.guix.gnu.org, request: "GET /rzs7jiwhyls2bg8zv2g1vmmgcfwkvaax.narinfo 
HTTP/1.1", upstream: 
"http://127.0.0.1:3000/rzs7jiwhyls2bg8zv2g1vmmgcfwkvaax.narinfo";, host: 
"ci.guix.gnu.org"
2022/12/24 00:25:36 [info] 128741#0: *178080049 client 157.55.39.212 closed 
keepalive connection
2022/12/24 00:25:38 [info] 128741#0: *178080047 peer closed connection in SSL 
handshake while SSL handshaking, client: 127.0.0.1, server: 0.0.0.0:443
--8<---------------cut here---------------end--------------->8---

That would, in that specific case, invalidate your hypothesis number
2. When 'guix publish' timeouts, we have messages like:

--8<---------------cut here---------------start------------->8---
2022/12/24 00:45:36 [error] 128749#0: *178225211 upstream timed out (110: 
Connection timed out) while reading response header from upstream, client: 
86.87.253.49, server: ci.guix.gnu.org, request: "GET 
/m0hqyvj8di41xlraz087kmpr38zaw90r.narinfo HTTP/1.1", upstream: 
"http://127.0.0.1:3000/m0hqyvj8di41xlraz087kmpr38zaw90r.narinfo";, host: 
"ci.guix.gnu.org"
--8<---------------cut here---------------end--------------->8---

However, like suggested in your hypothesis number 1, it seems instead
that we are replying 404 to the worker which resets the connection. As
we have put aside the baking thing, the question is now why are those
derivations not available?

Are they not part of Berlin's store at that time? Or is the publish
server erroneously returns 404?

It looks like the requested derivations are still absent, as of right
now:

--8<---------------cut here---------------start------------->8---
mathieu@berlin /var/log/nginx$ ls /gnu/store/vx7lgvrcw6i6r0inw513qn6fj3cb27mm*
ls: cannot access '/gnu/store/vx7lgvrcw6i6r0inw513qn6fj3cb27mm*': No such file 
or directory
mathieu@berlin /var/log/nginx$ ls /gnu/store/b1g8vyqp8c52yk6vscdvjflci5mnladq*
ls: cannot access '/gnu/store/b1g8vyqp8c52yk6vscdvjflci5mnladq*': No such file 
or directory
mathieu@berlin /var/log/nginx$ ls /gnu/store/007zgflsl5xkr377wpakbsis5c2yqh1q*
ls: cannot access '/gnu/store/007zgflsl5xkr377wpakbsis5c2yqh1q*': No such file 
or directory
--8<---------------cut here---------------end--------------->8---

As I don't have much clue about what are those derivations, I think we
should instrument a bit the publish server and maybe the substitute
script like proposed in the attachments.

WDYT?

Thanks,

Mathieu
>From 9f9c839937ac2edd1b5901b2262c4be0954fa20c Mon Sep 17 00:00:00 2001
From: Mathieu Othacehe <othacehe@gnu.org>
Date: Wed, 28 Dec 2022 15:12:46 +0100
Subject: [PATCH 1/2] scripts: publish: Add a custom baking header.

Log the not-found responses and their reason (baking or not) to stdout. Also
send the X-Baking custom header so that the client can be informed of the
cause of the failure.

* guix/scripts/publish.scm (not-found): Add a baking? argument to add the
X-Baking HTTP header to the response if baking is in progress.  Also, log the
404 responses to stdout, indicating if it is due to baking or not.
(render-narinfo/cached): Pass the baking? argument.
---
 guix/scripts/publish.scm | 25 ++++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/guix/scripts/publish.scm b/guix/scripts/publish.scm
index 3bf3bd9c7c..11fedf092e 100644
--- a/guix/scripts/publish.scm
+++ b/guix/scripts/publish.scm
@@ -4,7 +4,7 @@
 ;;; Copyright © 2015-2022 Ludovic Courtès <ludo@gnu.org>
 ;;; Copyright © 2020 Maxim Cournoyer <maxim.cournoyer@gmail.com>
 ;;; Copyright © 2021 Simon Tournier <zimon.toutoune@gmail.com>
-;;; Copyright © 2021 Mathieu Othacehe <othacehe@gnu.org>
+;;; Copyright © 2021, 2022 Mathieu Othacehe <othacehe@gnu.org>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -375,14 +375,28 @@ (define* (narinfo-string store store-path
                                            compression)))
                  compressions))))
 
+;; Custom header to indicate that baking is in progress.
+(declare-opaque-header! "X-Baking")
+
 (define* (not-found request
-                    #:key (phrase "Resource not found")
+                    #:key
+                    baking?
+                    (phrase "Resource not found")
                     ttl)
   "Render 404 response for REQUEST."
+  (format #t (G_ "↳ ~a ~a: 404~a~%")
+          (request-method request)
+          (uri-path (request-uri request))
+          (if baking? " (baking)" ""))
   (values (build-response #:code 404
-                          #:headers (if ttl
-                                        `((cache-control (max-age . ,ttl)))
-                                        '()))
+                          #:headers
+                          (append
+                           (if ttl
+                               `((cache-control (max-age . ,ttl)))
+                               '())
+                           (if baking?
+                               '((x-baking . "1"))
+                               '())))
           (string-append phrase ": "
                          (uri-path (request-uri request)))))
 
@@ -587,6 +601,7 @@ (define (delete-entry narinfo)
                                #:nar-path nar-path
                                #:compressions compressions)
                (not-found request
+                          #:baking? #t
                           #:phrase "We're baking it"
                           #:ttl 300)))          ;should be available within 5m
           (else
-- 
2.38.1

>From 25ffc57864dbf34ca58741f89c1f790dbde6702f Mon Sep 17 00:00:00 2001
From: Mathieu Othacehe <othacehe@gnu.org>
Date: Wed, 28 Dec 2022 15:19:29 +0100
Subject: [PATCH 2/2] substitutes: Log the failing queries.

* guix/substitutes.scm (%debug?): New variable.
(handle-narinfo-response): Log the failing queries if the %debug? parameter is
set.
---
 guix/substitutes.scm | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/guix/substitutes.scm b/guix/substitutes.scm
index 9014cf61ec..819eb2c73e 100644
--- a/guix/substitutes.scm
+++ b/guix/substitutes.scm
@@ -90,6 +90,12 @@ (define %narinfo-cache-directory
           (string-append %state-directory "/substitute/cache"))
       (string-append (cache-directory #:ensure? #f) "/substitute")))
 
+(define %debug?
+  ;; Enable debug mode by setting the GUIX_SUBSTITUTE_DEBUG environmnent
+  ;; variable.
+  (make-parameter
+   (getenv "GUIX_SUBSTITUTE_DEBUG")))
+
 (define (narinfo-cache-file cache-url path)
   "Return the name of the local file that contains an entry for PATH.  The
 entry is stored in a sub-directory specific to CACHE-URL."
@@ -224,6 +230,15 @@ (define (handle-narinfo-response request response port 
result)
           (let* ((path      (uri-path (request-uri request)))
                  (hash-part (basename
                              (string-drop-right path 8)))) ;drop ".narinfo"
+            ;; Log the failing queries and indicate if it failed because the
+            ;; narinfo is being baked.
+            (when (%debug?)
+              (let ((baking?
+                     (assoc-ref (response-headers response) 'x-baking)))
+                (display
+                 (format #f "could not fetch ~a~a ~a~a~%"
+                         url path code
+                         (if baking? " (baking)" "")))))
             (if len
                 (get-bytevector-n port len)
                 (read-to-eof port))
-- 
2.38.1


reply via email to

[Prev in Thread] Current Thread [Next in Thread]