[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Reboots?

From: Marcus Brinkmann
Subject: Re: Reboots?
Date: Mon, 2 Apr 2001 05:43:13 +0200
User-agent: Mutt/1.3.15i

On Sun, Apr 01, 2001 at 06:14:27PM -0400, Roland McGrath wrote:
> > Ok. I put it in proc's demuxer, so we miss out anything that is processed
> > earlier. I will move the code into libproc before I do another test (and
> > increase the buffer a bit).
> That will have a better chance of catching the crash location, but 
> it still could easily be obscured if the corruption is sufficient.

Yes. I have mixed feelings about my latest results. Obviously, it is not
telling us much more than we already know, but it is one more data point as
a comparison, so here it comes. There is a similarity. The previous crash
was in/after set_arg_locations which followed a setmsgport on one local
port, and get_arg_locations on another local port.

This crash is in/after a semsgoprt on one local port, followed by a
get_arg_locations on another local port.

I am not proposing that this vague similarity is pointing at the bug. But
the _arg_locations functions are really very simple, and setmsgport is the
very last function in the bunch that has the potential to do a complex job
(if checkmsghangs is true).

> If you made your buffer really big, it might be that the trace of messages
> would show us something particularly unusual that we could suspect.  But I
> would not hold out a lot of hope that the problem will suddenly become
> apparent just because we can see more of the past RPCs.

> There isn't any reliable way to predict which thread will wake up to take
> the next message from the portset.  The best thing I can think of is to
> arrange that there only be one thread waiting in the portset at a time, and
> that each thread completely unwind its stack and die after handling one
> request.  Then it should crash immediately during that unwind if the
> clobberation is of the sort I have been describing.  This will slow the
> server down a lot.

I think this might be worth a try. We will see if the bug is reproducable
with such a change or not. I hope I am able to hack it up following your

If this working assumption is correct, it doesn't make sense to log reply
messages, right? The proc global lock doesn't prevent a new thread from
coming up, so not seeing a reply logged immediately before the crash does
not mean it is this function that crashed when returning and unwinding a
part of the stack. On the other hand, always seeing a reply message logged
means that the function returned correctly, which means that this bit of the
stack is okay. Mmmh. I will just add reply msg logging to be sure. We'll see
what happens.


`Rhubarb is no Egyptian god.' Debian http://www.debian.org brinkmd@debian.org
Marcus Brinkmann              GNU    http://www.gnu.org    marcus@gnu.org

reply via email to

[Prev in Thread] Current Thread [Next in Thread]