[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Reboots?

From: Marcus Brinkmann
Subject: Re: Reboots?
Date: Sun, 1 Apr 2001 15:42:28 +0200
User-agent: Mutt/1.3.15i

On Sun, Apr 01, 2001 at 03:46:03AM -0400, Roland McGrath wrote:
> > I have reproduced exactly the crash Jeff reported. I have collected the 
> > data.
> > I used a ring buffer of 16 entries (can increase if needed), and the full
> I gather from your data that what you mean is a buffer of the last 16
> messages handled by proc's demuxer?


> In the general case one would would
> want to track the reply messages too, and see if there is interleaving of
> the RPCs, i.e. a second RPC beginning processing before another has
> finished.  But in this case we know that this is the sequence of calls done
> by fork, which does them all serially.

I will keep this in mind for later.

> Where did you put your code to write into your buffer?  You want it to be
> the very first thing in libports's internal_demuxer.  If you put in proc's
> demuxer, then ports_lookup_port and so forth happen before you make the
> record--so we would miss the final message if it's in the libports code
> where it crashes.

Ok. I put it in proc's demuxer, so we miss out anything that is processed
earlier. I will move the code into libproc before I do another test (and
increase the buffer a bit).
> > gdb log is attached. Here are the three ports on which RPCs where logged
> > immediately before the crash (in interleaved order, see left column). 
> This sequence of calls is clearly fork.  Are you sure you have the global
> ordering of those messages right?

Yes, because the code is in the global lock. Thanks for the RPC fork
sequence it is useful.

> > Of course, one data point is not very much. I can run this a few more times,
> > and we can see if a pattern emerges. We can insert assertions etc.
> > We can probably log whole messages.
> Just the headers ought to be enough to understand what's going on.
> I don't know what assertions to suggest inserting.  

I thought of something verifying that the stack is ok. I don't have any
concrete ideas, though.
> > Can we run proc single threaded, so that we know where exactly it crashed?
> We can't make proc single-threaded, because wait works using condition
> variables.  However, multithreadedness is not what is preventing us from
> seeing the crash location.  It is because it jumps off into nowhere, and/or
> clobbers its stack, that we have trouble figuring out where it went bad.
> proc is essentially totally serialized by its global_lock.  I think the
> problem is probably a stack clobberation.  Just the right kind of
> corruption of the stack during an RPC server function could cause that RPC
> to complete fine and send its reply, but leave a little time bomb that will
> cause this thread to crash whenever it happens to be the one to dequeue a
> message from the portset.  In such a scenario, it could have been a totally
> unrelated RPC much earlier that left a thread waiting to crash, and just
> this flurry of RPCs happened to run through other request threads so that
> the one with the corrupted stack came up as ready for the next portset msg.

This sounds horrible.  Is one consequence to increase the log buffer, so we
see what the thread that comes next has done last time?  Is the thread that
would handle the next request predictable or at least log-able?  Can we
assign a thread to watch out for stack clobberations?  I really have no idea
how to attack such a nasty problem.  How can stack clobberation occur? 


`Rhubarb is no Egyptian god.' Debian http://www.debian.org brinkmd@debian.org
Marcus Brinkmann              GNU    http://www.gnu.org    marcus@gnu.org

reply via email to

[Prev in Thread] Current Thread [Next in Thread]