guix-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug#50040] [PATCH 0/2] publish: Always render nar/narinfo during backin


From: Mathieu Othacehe
Subject: [bug#50040] [PATCH 0/2] publish: Always render nar/narinfo during backing.
Date: Fri, 17 Sep 2021 17:27:04 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)

Hey Ludo,

> The ‘mumi worker’ process also seems to be doing a lot of I/O and using
> quite a bit of CPU (presumably allocating a lot).  Its ‘perf’ profile is
> like this:
>
>    2.62%  .mumi-real  libguile-3.0.so.1.1.1  [.] scm_ilength
>    1.20%  .mumi-real  libguile-3.0.so.1.1.1  [.] scm_getc
>    1.12%  .mumi-real  libgc.so.1.4.3         [.] 
> GC_add_to_black_list_normal.part.4
>    1.06%  .mumi-real  libgc.so.1.4.3         [.] GC_build_fl
>    0.98%  .mumi-real  libgc.so.1.4.3         [.] GC_reclaim_clear
>    0.97%  .mumi-real  libguile-3.0.so.1.1.1  [.] get_callee_vcode
>    0.91%  .mumi-real  libgc.so.1.4.3         [.] GC_generic_malloc_many
>    0.90%  .mumi-real  libguile-3.0.so.1.1.1  [.] peek_byte_or_eof
>    0.78%  .mumi-real  [JIT] tid 62601        [.] 0x00007f886964804d
>
> Food for thought…

Yep, interesting findings, thanks for sharing. I just discovered that
the Nginx server is logging in /var/log/nginx/error.log the errors we
are interested in:

--8<---------------cut here---------------start------------->8---
2021/09/17 14:13:37 [error] 129925#0: *3727055 upstream timed out (110: 
Connection timed out) while reading response header from upstream, client: XXX, 
server: ci.guix.gnu.org, request: "GET /3m1j4ddx11
prp6azw3rjdhljg5vchf1s.narinfo HTTP/1.1", upstream: 
"http://127.0.0.1:3000/3m1j4ddx11prp6azw3rjdhljg5vchf1s.narinfo";, host: 
"ci.guix.gnu.org"
2021/09/17 14:13:37 [error] 129925#0: *3727020 upstream timed out (110: 
Connection timed out) while reading response header from upstream, client: XXX, 
server: ci.guix.gnu.org, request: "GET /pdhr7fgql3
qm1x24yxkj4c2f6s7vffi5.narinfo HTTP/1.1", upstream: 
"http://127.0.0.1:3000/pdhr7fgql3qm1x24yxkj4c2f6s7vffi5.narinfo";, host: 
"ci.guix.gnu.org"
2021/09/17 14:13:37 [error] 129925#0: *3726717 upstream timed out (110: 
Connection timed out) while reading response header from upstream, client: XXX, 
server: ci.guix.gnu.org, request: "GET /19gv6sq
qcmn8q020bpm9jyl6n9bjpg05.narinfo HTTP/1.1", upstream: 
"http://127.0.0.1:3000/19gv6sqqcmn8q020bpm9jyl6n9bjpg05.narinfo";, host: 
"ci.guix.gnu.org"
2021/09/17 14:13:37 [error] 129925#0: *3725361 upstream timed out (110: 
Connection timed out) while reading response header from upstream, client: 
141.80.167.169, server: ci.guix.gnu.org, request: "GET /y8dp69gk
y4dn5vnn7wnx04j20q0yrdja.narinfo HTTP/1.1", upstream: 
"http://127.0.0.1:3000/y8dp69gky4dn5vnn7wnx04j20q0yrdja.narinfo";, host: 
"141.80.167.131"
--8<---------------cut here---------------end--------------->8---

Despite the following diff, extending the timeout delays to 10 seconds,
we still have a lot of those errors. Some happening on the build farm
(141.80.167.X), some directly on user machines.

--8<---------------cut here---------------start------------->8---
diff --git a/hydra/nginx/berlin.scm b/hydra/nginx/berlin.scm
index 44ff28e..7a085e5 100644
--- a/hydra/nginx/berlin.scm
+++ b/hydra/nginx/berlin.scm
@@ -97,9 +97,9 @@
 
            ;; Do not tolerate slowness of hydra.gnu.org when fetching
            ;; narinfos: better return 504 quickly than wait forever.
-           "proxy_connect_timeout 2s;"
-           "proxy_read_timeout 2s;"
-           "proxy_send_timeout 2s;"
+           "proxy_connect_timeout 10s;"
+           "proxy_read_timeout 10s;"
+           "proxy_send_timeout 10s;"
--8<---------------cut here---------------end--------------->8---

I'll see if I can reproduce the I/O pressure on demand, to be able to
have more accurate strace/perf investigations.

Thanks,

Mathieu





reply via email to

[Prev in Thread] Current Thread [Next in Thread]