bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#34655: 26.1.92; Segfault in module with --module-assertions


From: Eli Zaretskii
Subject: bug#34655: 26.1.92; Segfault in module with --module-assertions
Date: Mon, 18 Mar 2019 18:21:25 +0200

> From: "Basil L. Contovounesios" <contovob@tcd.ie>
> Cc: <34655@debbugs.gnu.org>,  <p.stephani2@gmail.com>
> Date: Sun, 17 Mar 2019 23:52:55 +0000
> 
> > I tried at the time to reproduce your problem, and failed.  But I did
> > that on Windows, where I needed to replace the non-existent realpath
> > by an equivalent function, so it's not a faithful reproduction.  I
> > will see if I can find time to look at this on a GNU machine, unless
> > someone beats me to it.
> 
> Replacing 'canonicalize_file_name' with 'strdup' still reproduces the
> issue for me.  Perhaps increasing the number of calls to
> realpath-truename from 1000 to 5000 will also help.

Right, the strdup part did that for me.  (My previous attempt also
used strdup as part of the replacement, but still failed to reproduce.
Memory corruption bugs are frequently weird that way.)

So I modified your recipe slightly, like this:

  (progn (module-load "/path/to/realpath.so")
         (setq garbage-collection-messages t)
         (dotimes (i 5000)
           (message "%d" i)
           (realpath-truename user-emacs-directory)))

put it on a file named rp.el, and then ran it under GDB:

  (gdb) r -batch --module-assertions -l ./rp.el

Here's what I see:

  [...]
  3077
  3078
  3079
  3080
  Garbage collecting...
  Garbage collecting...done

  Thread 1 received signal SIGSEGV, Segmentation fault.
  0x011e9918 in rpl_re_search_2 (bufp=0x17c0260 <searchbufs+4736>, str1=0x0,
      size1=0, str2=0x0, size2=20, startpos=0, range=20, regs=0x0, stop=20)
      at regex-emacs.c:3322
  3322                            buf_ch = STRING_CHAR_AND_LENGTH (d, 
buf_charlen)

Looks like it indeed crashes after a GC, and on my system needs more
than 3000 iterations.  So let's run it with a breakpoint at the
beginning of GC:

  (gdb) break alloc.c:6044
  Breakpoint 3 at 0x11fa1bb: file alloc.c, line 6044.
  (gdb) r -batch --module-assertions -l ./rp.el
  [...]
  3080

  Thread 1 hit Breakpoint 3, garbage_collect_1 (gcst=0x82bb84) at alloc.c:6044
  6044      record_in_backtrace (QAutomatic_GC, 0, 0);

The backtrace at this point:

  (gdb) bt
  #0  garbage_collect_1 (gcst=0x82bb84) at alloc.c:6044
  #1  0x011fa88e in garbage_collect () at alloc.c:6241
  #2  0x01149adc in maybe_gc () at lisp.h:5028
  #3  0x0123b012 in Ffuncall (nargs=2, args=0x82bcb0) at eval.c:2829
  #4  0x0128c3cf in module_funcall (env=0x6122730, fun=0x6122868, nargs=1,
      args=0x82bd50) at emacs-module.c:483
  #5  0x62d8136f in rp_funcall (env=0x6122730, value=0x82bd50,
      name=0x62d83060 "directory-name-p", nargs=1, args=0x82bd50)
      at realpath.c:62
  #6  0x62d815cc in Frealpath_truename (env=0x6122730, nargs=1, args=0x82bd90,
      data=0x0) at realpath.c:124
  [...]

  Lisp Backtrace:
  "directory-name-p" (0x82bcb8)
  "realpath-truename" (0x82bf80)
  "while" (0x82c2c8)
  "let" (0x82c538)
  "eval-buffer" (0x82cab0)
  "load-with-code-conversion" (0x82d0f0)
  "load" (0x82d9b8)
  "command-line-1" (0x82e3d0)
  "command-line" (0x82efe8)
  "normal-top-level" (0x82f690)

As you see, the call to Ffuncall is the one that triggers GC from time
to time.

What happens with our 'file' at this point?

  (gdb) fr 6
  (gdb) p file
  $1 = (emacs_value) 0x6122848
  (gdb) p *file
  $2 = <incomplete type>
  (gdb) p *(Lisp_Object *)file
  $3 = XIL(0x8000000006121ed0)
  (gdb) xtype
  Lisp_String
  (gdb) xstring
  $4 = (struct Lisp_String *) 0x6121ed0
  "d:/usr/eli/.emacs.d/"

Still valid.  Now let's see who thrashes it:

  (gdb) p *$4
  $5 = {
    u = {
      s = {
        size = 20,
        size_byte = 20,
        intervals = 0x0,
        data = 0x611e9fc "d:/usr/eli/.emacs.d/"
      },
      next = 0x14,
      gcaligned = 20 '\024'
    }
  }
  (gdb) watch -l $4->u.s.data
  Hardware watchpoint 4: -location $4->u.s.data
  (gdb) c
  Continuing.
  Garbage collecting...

  Thread 1 hit Hardware watchpoint 4: -location $4->u.s.data

  Old value = (unsigned char *) 0x611e9fc "\024"
  New value = (unsigned char *) 0x0
  sweep_strings () at alloc.c:2163
  2163                      NEXT_FREE_LISP_STRING (s) = string_free_list;
  (gdb) list
  2158                      /* Reset the strings's `data' member so that we
  2159                         know it's free.  */
  2160                      s->u.s.data = NULL;
  2161
  2162                      /* Put the string on the free-list.  */
  2163                      NEXT_FREE_LISP_STRING (s) = string_free_list;
  2164                      string_free_list = ptr_bounds_clip (s, sizeof *s);
  2165                      ++nfree;
  2166                    }
  2167                }

Bingo!  This is sweep_strings freeing our string, because it evidently
doesn't think it's a Lisp object that is being still referenced.

The culprit is this fragment from emacs-module.c, which is called each
time you receive a Lisp object from Emacs which you want to use in
your module:

  /* Convert O to an emacs_value.  Allocate storage if needed; this can
     signal if memory is exhausted.  Must be an injective function.  */
  static emacs_value
  lisp_to_value (emacs_env *env, Lisp_Object o)
  {
    if (module_assertions)
      {
        /* Add the new value to the list of values allocated from this
           environment.  The value is actually a pointer to the
           Lisp_Object cast to emacs_value.  We make a copy of the
           object on the free store to guarantee unique addresses.  */
        ATTRIBUTE_MAY_ALIAS Lisp_Object *optr = xmalloc (sizeof o);
        *optr = o;
        void *vptr = optr;
        ATTRIBUTE_MAY_ALIAS emacs_value ret = vptr;
        struct emacs_env_private *priv = env->private_members;
        priv->values = Fcons (make_mint_ptr (ret), priv->values);
        return ret;
      }

What this does is make a copy of each Lisp object you get from Emacs,
store that copy in memory allocated off the heap, and hand your module
a pointer to the copy instead of the original object.  So when you
call, e.g., directory-name-p, an Emacs function, it gets that copy of
the object.  But memory allocation by xmalloc doesn't record the
allocated memory in the red-black tree we maintain for the purposes of
detecting Lisp objects referenced by C stack-based variables.  So when
GC comes to examine the C stack, it doesn't consider the variable
'file' in your module as being a pointer to a live Lisp object, and so
it doesn't mark it.  Then the sweep phase of GC recycles your Lisp
object, which in this case involves setting the string's data to a
NULL pointer.

The patch to fix this is below; it simply marks these copied values by
hand, thus preventing them from being GCed.  It ran successfully with
even 50,000 iterations.

Philipp, any comments/objections?

--- src/emacs-module.c~0        2019-02-25 10:18:35.000000000 +0200
+++ src/emacs-module.c  2019-03-18 08:33:00.564476000 +0200
@@ -1133,6 +1133,15 @@ mark_modules (void)
       mark_object (priv->non_local_exit_symbol);
       mark_object (priv->non_local_exit_data);
       mark_object (priv->values);
+      if (module_assertions)
+       {
+         for (Lisp_Object values = priv->values;
+              CONSP (values); values = XCDR (values))
+           {
+             Lisp_Object *p = xmint_pointer (XCAR (values));
+             mark_object (*p);
+           }
+       }
     }
 }
 





reply via email to

[Prev in Thread] Current Thread [Next in Thread]