[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: malloc() patches round 3

From: Igor Khavkine
Subject: Re: malloc() patches round 3
Date: Wed, 22 Aug 2001 20:50:46 -0400
User-agent: Mutt/1.3.20i

On Wed, Aug 22, 2001 at 05:06:19PM -0700, Thomas Bushnell, BSG wrote:
> Igor Khavkine <i_khavki@alcor.concordia.ca> writes:
> > In my perfect world, OS's don't crash unless there is a hardware failure
> > or an internal inconsistency, and resource exhaustion is neither.  If
> > you want the system to reboot in that situation, all that is needed is
> > some sort of daemon that uses a fixed amount of resources and reboots
> > the system if the error is propagated to it.
> > 
> > We have an opportunity to create something like this. And just because
> > Mach crashes when it's out of memory, doesn't mean it's the right thing
> > to do. We can change that as well.
> Unfortunately, this is a much more global problem than just passing
> error codes around.  Among other things, you need to edit carefully
> the behavior of all those other Debian packages.  If any of a jillion
> of them misbehaves under resource failures, then chaos ensues.

If third party programs misbehave when faced with resource shortages
that's their problem. The important thing to do is to have some sort
of fixed-resource "way out" implemented by the kernel/servers/native
utils that can get you out of a sticky situation when things go wrong.
That's very similar to a statically linked shell for root in case
the dynamic linker stops working, or the 5% of any partition reserved
for the super user.

> The idea of just having things sit and wait for resources to become
> available is also not adequate.  Such things are invitations to
> deadlock.  Instead, there needs to be a way to ask programs to release
> resources.  And that requires even more significant pervasive changes.

That's not at all what I'm proposing. My idea is for anything that
acts in a supporting role (libraries, system calls, servers, etc.) not
to fail of their own volition unless they know that ABSOLUTELY NOTHING
else can be done. All other errors should be propagated to the program
or client that made the library/system/RPC call. This sort of "support
code" should not take upon itself to handle error conditions that do
not give users of this "support code" the freedom to handle it the
way they want.

As to how to behave when there is a shortage of resources, there are
many things you could do. Just fail, return an error code, wait (although
as you said this is often deadlock prone), ask for unused resources to
be released. We just have to make sure that there is a small core of
code (os/servers/utils) that provide some minimal functionality in
case of emergency, the rest should attend to itself.

> And those changes are not just kernel, or hurd, changes.  They have to
> occur in every program in every package.  
> The cost of getting it wrong, however, is that the system misbehaves
> or deadlocks: and in those cases, it would be better if the system had
> rebooted.

This behavior could always be implemented as a special case of what
I'm proposing.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]