bug-guile
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#19883: Correction for backtrace


From: David Kastrup
Subject: bug#19883: Correction for backtrace
Date: Thu, 26 Feb 2015 13:32:00 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.0.50 (gnu/linux)

address@hidden (Ludovic Courtès) writes:

> David Kastrup <address@hidden> skribis:
>
>> address@hidden (Ludovic Courtès) writes:
>>
>>> David Kastrup <address@hidden> skribis:
>>>
>>>> This is embarrassing: I used the wrong executable in connection with the
>>>> core dump.  With the matching executable, the coredump makes a lot more
>>>> sense:
>>>>
>>>> #0  0x00000000 in ?? ()
>>>> #1  0x0804aee0 in Smob_base<Family>::mark_trampoline (arg=0x9fbb000)
>>>>     at smobs.tcc:34
>>>> #2  0xb761b2da in ?? () from /usr/lib/libguile-2.0.so.22
>>>> #3  0xb72751f8 in GC_mark_from () from /usr/lib/i386-linux-gnu/libgc.so.1
>>>
>>> Could you try commenting out all the SMOB mark functions in LilyPond?
>>>
>>> This doesn’t fix the bug, of course, but it’s probably a good
>>> workaround: user-provided mark functions are not needed in Guile 2.0
>>> since libgc scans the whole heap for live pointers.
>>
>> Even the test program crashes at the end (when `count' is called in
>> order to traverse the created hierarchy) when you disable the setting of
>> the mark function in the init method in smobs.tcc.
>
> Could you add debugging symbols for libguile?  I don’t understand how
> ‘count’ gets called.

Figure me surprised.  Here is the recursive walk:

int
Family::count ()
{
  int sum = 1;
  for (int i = 0; i < kids.size (); i++)
    sum += kids[i]->count ();
  return sum;
}

and here is the starting call in workload():

  cout << "last has " << Family::unsmob (k)->count () << endl;

> Do you know if this is a use-after-free error?

Sure.  Nothing else would clobber the kids[] array to contain bad
pointers.

> If this is the case, Andy had the idea of turning on topological
> finalization in the GC.  This may help for this particular case, but I
> vaguely recall that this breaks other finalizer-related things.

I don't see why.  Topological finalization might help with
mark-after-free.  But why would it help if there is not even any mark
call involved?  This is clearly use-after-free.

> (I would check by myself, but ISTR that building LilyPond “on one’s
> own” is not recommended.  What would you suggest?  A Guix recipe would
> be sweet.)

Is there a reason you are not using the test program provided with this
bug report?  There is no real point in experimenting with LilyPond's
complexity when a simple test program using its memory management
classes already crashes.

LilyPond's GUILEv2 branch is currently out of order again since 2.0.11
changed encoding mechanisms _again_ in an incompatible manner (what
GUILE calls "stable" is anything but).  It is becoming harder and harder
to work around GUILE's attempts of wresting encoding control from the
application, while GUILE has no byte-transparent decoding of UTF-8, does
not support strings encoded in UTF-8, and (as of 2.0.11 or 2.0.10)
supports _only_ string ports redecoded to UTF-8.

So dealing with memory-mapped UTF-8 encoded files which are multiplexed
between reading by GUILE and reading by an UTF-8 decoding parser has
again been thwarted.  While I try figuring out how to repair the damage
this time, testing with LilyPond itself is hard to interpret since a
number of problems are not related to the memory management.

As long as this simple test program can show the memory management
related crashes, I don't see the point in throwing people at LilyPond:
that has not delivered any results the last several times I tried it.

>> A pointer to a C++ structure does not appear to protect the
>> corresponding SMOB data and free_smob calls the delete operator which
>> calls destructors and clobbers the memory area.
>
> Oh, I was mistaken in my previous message.  GC scans the stack and the
> GC-managed heap (stuff allocated with GC_MALLOC/scm_gc_malloc et al.),
> but it does *not* scan the malloc/new heap.
>
> So indeed, C++ objects that hold references to ‘SCM’ objects, such as
> instances of ‘Smob<X>’, must either have a mark function, or they must
> be allocated with scm_gc_malloc.
>
> Would it be possible to add a ‘new’ operator to ‘Smob’ that uses
> ‘scm_gc_malloc’, and a ‘delete’ operator that uses ‘scm_gc_free’?

It would not help since many of the references are stored in STL
containers (like std::vector <Grob *>) which have their data
allocated/deallocated separately from the memory area of the structure
itself.

Frankly, I don't get the current strategy of GUILE: basically any use of
scm_set_smob_mark will result in a function that can be called with
garbage from a smob that has already been deallocated via the function
registered with scm_set_smob_free.

GUILEv2 developers have resisted fixing this bug for years by trying to
stop people from using scm_set_smob_mark and instead telling people to
have their entire heap scanned by a conservative garbage collector.

For an application like LilyPond which can easily have the heap cover
more than half of the available address space and run for half an hour
(when generating docs) processing independent files with large
individual memory requirements, this strategy will have both
considerable performance impacts as well as bleed enough randomly
retained memory to run the application into the ground eventually.

In my current work on fixing the encoding stuff again I have patched my
code to deal with the mark-after-free errors in the free and mark
trampolines myself.  I need to find a solution for the encoding mess
before I can actually indulge in more testing of this workaround.

However, due to the intransparency of GUILE's implementation and the
multithreaded collector, I have no guarantees that my work on the
respective trampolines will reliably prevent all mark-after-free errors.

This is something that needs to get fixed in GUILE.  It does not make
sense to provide a mark callback mechanism that can be called with
garbage in GUILE's free store.  When GUILE releases/collects memory, it
does not make sense to leave the SMOB cells in a state indistinguishable
from from valid data.  Apart from causing crashes in mark functions,
this makes work much harder for the conservative garbage collector.

-- 
David Kastrup





reply via email to

[Prev in Thread] Current Thread [Next in Thread]