bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#37757: Kernel panic upon shutdown


From: Ludovic Courtès
Subject: bug#37757: Kernel panic upon shutdown
Date: Mon, 09 Dec 2019 14:47:59 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)

Hello,

[+Cc: Andy for a heads-up on the fix below.]

Ludovic Courtès <address@hidden> skribis:

> It turns out the previous patch didn’t work; in short, we really have to
> use async-signal-safe functions only from the signal handler, so this
> has to be done in C.
>
> The attached patch does that.  I’ve tried it with ‘guix system
> container’ and it seems to dump core as expected, from what I can see.
>
> Let me know if you manage to reproduce the bug and to get a core dumped
> with this patch.

Good news!  The patch does indeed allow shepherd to dump core, and I
managed to grab the backtrace below on an x86_64 machine running Guix
System (from yesterday) with GNOME:

--8<---------------cut here---------------start------------->8---
Using host libthread_db library 
"/gnu/store/ahqgl4h89xqj695lgqvsaf6zh2nhy4pj-glibc-2.29/lib/libthread_db.so.1".
Core was generated by 
`/gnu/store/1mkkv2caiqbdbbd256c4dirfi4kwsacv-guile-2.2.6/bin/guile 
--no-auto-com'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  handle_crash (sig=11)
    at /gnu/store/dayk54wxskp14w53813384azhxmd5awz-shepherd-crash-handler.c:43
43            * (int *) 0 = 42;
[Current thread is 1 (LWP 4635)]

[…]

Thread 1 (LWP 4635):
#0  handle_crash (sig=11) at 
/gnu/store/dayk54wxskp14w53813384azhxmd5awz-shepherd-crash-handler.c:43
        infinity = {rlim_cur = 18446744073709551615, rlim_max = 
18446744073709551615}
        pid = <optimized out>
        msg = "Shepherd crashed!\n"
        pid = <optimized out>
#1  <signal handler called>
No locals.
#2  handle_crash (sig=6) at 
/gnu/store/dayk54wxskp14w53813384azhxmd5awz-shepherd-crash-handler.c:43
        infinity = {rlim_cur = 18446744073709551615, rlim_max = 
18446744073709551615}
        pid = <optimized out>
        msg = "Shepherd crashed!\n"
        pid = <optimized out>
#3  <signal handler called>
No locals.
#4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
        set = {__val = {0, 2314885530818445312, 0 <repeats 14 times>}}
        pid = <optimized out>
        tid = <optimized out>
        ret = <optimized out>
#5  0x00007f03eef40891 in __GI_abort () at abort.c:79
        save_stage = 1
        act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, 
sa_mask = {__val = {0 <repeats 13 times>, 139654877144192, 0, 
139654877624544}}, sa_flags = -279049286, sa_restorer = 0x7f03ef57e480 
<read_finalization_pipe_data>}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#6  0x00007f03ef57e89a in finalization_thread_proc (unused=<optimized out>) at 
finalizers.c:228
        data = {byte = -24 '\350', n = -1, err = 4}
#7  0x00007f03ef56f35a in c_body (d=0x7f03ed152e50) at continuations.c:422
        data = 0x7f03ed152e50
#8  0x00007f03ef5f079f in vm_regular_engine (thread=0x2, vp=0x7f03eb1caea0, 
registers=0x0, resume=-286001158) at vm-engine.c:786
        ret = 2
        ip = <optimized out>
        sp = <optimized out>
        op = 10
        jump_table_ = {…}
        jump_table = 0x7f03ef64d8e0 <jump_table_>

[…]

#19 scm_with_guile (func=<optimized out>, data=<optimized out>) at threads.c:710
No locals.
#20 0x00007f03ef497015 in start_thread (arg=0x7f03ed153700) at 
pthread_create.c:486
        ret = <optimized out>
        pd = 0x7f03ed153700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139654839219968, 
-749312912628550421, 140727702524830, 140727702524831, 140727702524832, 
139654839219968, 837174519050892523, 837169745183601899}, mask_was_saved = 0}}, 
priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, 
canceltype = 0}}}
        not_first_call = <optimized out>
#21 0x00007f03eeffd91f in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.
--8<---------------cut here---------------end--------------->8---

So what happens is that ‘finalization_thread_proc’ in Guile receives
EINTR (data.err == 4) but then, despite EINTR, it goes on to check the
value of ‘data.byte’ and aborts because it’s neither 0 nor 1.

My plan is to:

  1. push the patch below to the ‘stable-2.2’ branch of Guile;
     done:
     
<https://git.savannah.gnu.org/cgit/guile.git/commit/?h=stable-2.2&id=edf5aea7ac852db2356ef36cba4a119eb0c81ea9>;

  2. use a patched Guile for the ‘shepherd’ package;

  3. include the crash handler in the Shepherd.

Thoughts?

Thanks,
Ludo’.

diff --git a/libguile/finalizers.c b/libguile/finalizers.c
index c5d69e8e3..94a6e6b0a 100644
--- a/libguile/finalizers.c
+++ b/libguile/finalizers.c
@@ -1,4 +1,4 @@
-/* Copyright (C) 2012, 2013, 2014 Free Software Foundation, Inc.
+/* Copyright (C) 2012, 2013, 2014, 2019 Free Software Foundation, Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public License
@@ -211,21 +211,26 @@ finalization_thread_proc (void *unused)
 
       scm_without_guile (read_finalization_pipe_data, &data);
       
-      if (data.n <= 0 && data.err != EINTR) 
+      if (data.n <= 0)
         {
-          perror ("error in finalization thread");
-          return NULL;
+          if (data.err != EINTR)
+            {
+              perror ("error in finalization thread");
+              return NULL;
+            }
         }
-
-      switch (data.byte)
+      else
         {
-        case 0:
-          scm_run_finalizers ();
-          break;
-        case 1:
-          return NULL;
-        default:
-          abort ();
+          switch (data.byte)
+            {
+            case 0:
+              scm_run_finalizers ();
+              break;
+            case 1:
+              return NULL;
+            default:
+              abort ();
+            }
         }
     }
 }

reply via email to

[Prev in Thread] Current Thread [Next in Thread]