guile-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Debugging hints wanted


From: Roland Orre
Subject: Re: Debugging hints wanted
Date: Tue, 01 Jul 2008 12:14:11 +0200

Hi Ludovic, thanks for your reply
On Mon, 2008-06-30 at 21:42 +0200, Ludovic Courtès wrote:
> Hi,
> 
> Roland Orre <address@hidden> writes:
> 
> > I need hints on how to find occasional segmentation faults
> > and missed GC references. This relates to 64 bit machines.
> 
> Is it x86-64, IA64, or something else?
What I'm trying to get working now is on x86-64 (Opteron) to be
able to run it on a big large memory computer IA64 (Itanium2).

> The Git repository (the future 1.8.6) contains an important bug fix for
> IA64.  I think there were x86-64-related during the 1.8.x series, too.
> Thus, I'd suggest using the latest Guile on these platforms.

That's a good hint. I'll check out the code and see if I can locate
the changes. Problem is that I've considered switching a few years,
but since the array API changed from 1.8 it would imply a major rework,
possibly causing other issues as the old array API is used in 
hundreds of places in my code, and there may be other API changes
as well.

> > My modules have worked perfectly fine on 32 bit machines but
> > on 64 bits I occasionally get something like
> > #<freed cell 0x2...; GC missed a reference> if I run that
> > code fast, which indicates a threading problem (I do not use
> > threads in this case, but seems like guile does). This does
> > not occur if I run guile through gdb. This happens not too often
> > but it seems to be related to string->symbol symbol->string.
> 
> Is it reproducible?

This is not really reproducable. If I execute the lines quick by
loading it as a file then it occurs with about 60 % probability.
If I execute  the lines in that file, line by one, it does not
occur. To come around that I can see that it may be complaining
at e.g. a string->symbol conversion. If I then simply replace
that with the id  i.e. (lambda(x) x) then it doesn't happen
but probably this relates to the big issue below.

> > My bigger problem though is frequently occurring 
> > segmentation faults or otherwise corrupt pointers.
> >
> >  If I then run the code in gdb I can get
> > Program received signal SIGSEGV, Segmentation fault.
> > [Switching to Thread 0x2ae316e4f070 (LWP 6699)]
> > 0x00002ae314b9d091 in scm_gc_mark_dependencies (p=0x97c) at
> > gc-mark.c:441
> > 441      if (SCM_GC_MARK_P (ptr))
> > Current language:  auto; currently c
> 
> Likewise, is it reproducible?  Can you show the full backtrace (it
> should show where 0x97c comes from)?

This is fully reproducible when it happens as shown. Most often
I get a segmentation fault like this. I have attached a full
gdb backtrace from this. This can be produced over and over
with only base address differences.

Sometimes I've got a pointer to some internal structure like
pointing to the procedure of a loop in the middle of a list of
numbers for instance, which is kind illogical as that internal
structure should not be freed. 

> Hope this helps,
> Ludovic.
Best regards
Roland

Attachment: backtrace-full.txt
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]