Re: Part 2: System Structure

l4-hurd
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Part 2: System Structure

From:	Bas Wijnen
Subject:	Re: Part 2: System Structure
Date:	Fri, 19 May 2006 00:29:24 +0200
User-agent:	Mutt/1.5.11+cvs20060403
On Thu, May 18, 2006 at 02:09:28PM -0400, Jonathan S. Shapiro wrote:
> > But does this mean every piece of critical code should be in its own
> > address space?
> 
> Yes. That is *exactly* what it means. More generally, it means that
> every piece of code whose robustness is pragmatically important -- even
> if it is not critical -- should be in a separate address space.

I expect this costs performance (for setting up the address spaces all the
time).  I see this is useful in case recovery is indeed possible, but in many
cases I don't see the use of it.

> > What I'm saying is that the parent and the child fail as a unit (this is
> > unidirectional, the child and parent don't fail as a unit).
> 
> This is not necessarily true. Even when it is true in production, it is
> definitely not true for purposes of fault analysis. Also, in many cases
> it is possible for programs to recover sensibly from failed
> sub-computations, but only if the effects of those failed computations
> are known to have been properly contained.

Yes, that's what helpers are for.  In case a program wants to do a
computation, but it isn't sure about that part of the code (for example
because it was plugged in from an untrusted source), and especially if it can
recover when it fails, this computation should be started as a new process.
There's no reason to forbid the program access to the computation's memory,
though.  (There's also no reason to allow it, so you say the default should be
not to allow it.  I don't mind, I think it doesn't harm.  But I think it's
very important that the _user_ can always read and change that memory.  He
owns it, and should be able to do whatever he wants with it.)

> > > A direct consequence is that no isolation of faults exists in principle.
> > > If exec() screws up, you have no idea whether the problem lies in exec()
> > > or in some weird behavior of the program.
> > 
> > When programming with untrusted helpers, the idea is of course that the
> > parent is perfect.
> 
> This seems like a really bad idea. Any time you start an argument with
> "assume false is true", absolutely anything looks like a good design
> decision.

But I can prove that if false is true, then I am the pope! :-)

Of course the parent isn't really perfect.  But the situation is this:
- Bug in helper -> program recovers
- Bug in program -> program doesn't recover
Now there is one very special class of bugs, which are stray pointers into
exec, which manifest in the helper.  These look as bugs in the helper and the
program can recover from them.  I'm saying that I think this class is so rare
that I don't consider the extra trouble of an address space just for the
system call worth it.  And even if you do use a constructor, it will need to
be created.  The constructor creation can have all the same bugs that the exec
call can have.  So I don't think you really win a lot either.

> > I agree that this class of bugs may be hard to find.  I disagree that we
> > should "protect" ourselves from it.  This class of bugs are still possible
> > with a shielded exec(), and they will still be hard to find.
> 
> Actually, they are many orders of magnitude less frequent when critical
> code is shielded, and they are several orders of magnitude easier to
> find.
> 
> You are arguing contrary to empirically established fact. Isolation
> boundaries have been consistently observed to make systems several
> decimal orders of magnitude more robust when used appropriately.

I shall believe you and agree that we do want constructors, but I don't agree
that the memory they accept should by default be opaque to the user who owns
it.

> You are, in essence, proposing to throw away the only fundamental advantage
> that a microkernel offers: the ability to isolate and contain faults.

This ability is only useful if you can actually recover, I would think.

> We are operating from very different philosophies. My philosophy is:
> wherever you *can* design a system in a way that makes faults better
> isolated and easier to analyze, you *should*. The issue goes far beyond
> what "fails as a unit" in the field. It also applies to debugging and
> post-failure analysis. Code that is not isolated in the field cannot be
> instrumented in the field -- which is important when you are trying to
> figure out what went wrong.

I'm sorry, I do not understand what you are saying here.  Can you rephrase it,
please?

> > There are lots of things that are in the same address space even if things
> > would be more robust (in a non-practical way) when they weren't.  For
> > things which fail as a unit, this is not a problem.  If xmms crashes,
> > there's no point at all in protecting its plugin.  However, if the plugin
> > crashes, it is useful to protect xmms.  This is exactly how things work in
> > my proposal.
> 
> Yes, this describes your proposal, but your reasoning is flawed. It is
> very desirable to know that XMMS failed because it attempted an invalid
> access into its plugin (invalid in the sense "contrary to interface
> specification", not in the sense "disallowed for security reasons").

It isn't important to me that xmms has access to the plugin memory.  I agree
with you that it is a good idea to not map it into its address space by
default (that is, it should be unmapped when it calls exec()).  But I don't
see a problem in leaving the capability to map it in with the program.  I
think it is very important that the user whose space bank this is happening
from can map the memory into his address space.  And I think the easiest way
to allow this is by recursively allowing it to sub-space banks, because this
means that the session space bank is not special.  But as I said before, this
is an implementation detail.

> > > Of course it is failing. That isn't the question. The question is:
> > > 
> > >   You have a million line application with a stray pointer error
> > >   somewhere that manifests only when the moon is full, it is raining
> > >   in shanghai, and you tap the power cord with your left foot at just
> > >   the right frequency.
> > > 
> > >   How the @)_&% do you find it?
> > 
> > By debugging it when you see that it seems to happen.  Which means the
> > memory must have been transparent (for reading *and* writing) for you to
> > begin with. ;-)
> 
> This is exactly the wrong answer.

No it isn't, it's just not what you wanted to hear. ;-)

> Transparency is needed at debugging time, but the customer isn't doing the
> debugging.

Maybe not now, but if his application hangs, and I can't reproduce it, then
it's very useful indeed if I can debug it.

> Opacity is needed at execution time so that pointer errors will be diagnosed
> as quickly as possible.

For that the parent must not have the child's memory mapped in.  As I said
above, that is a very reasonable default.  But that's no argument against
leaving the ability to map it in open, and definitely not against leaving this
ability open for the user (as opposed to the parent process).

> It is plausible that the program will not intentionally refer into
> random places in the library, but we are not talking here about
> intentional references. It is VERY easy for the program to make an
> unintentional reference into the library as follows:
> 
>   1. Program calls something in library.
>   2. Library routine places pointer on stack, later returns.
>   3. Program makes uninitialized dereference through stack,
>      ends up using pointer left there by library code.

That makes sense, I hadn't thought of that.

> > You are correct, of course.  But there's really no reason that stray
> > pointers would be pointing at library code/data.  You are in fact solving
> > a problem that is too rare, IMO.
> 
> I have experience that says this isn't true. In the absence of hard
> data, you are speculating.

Indeed.  I should have been more clear about that, sorry.

> > > > If program A wants to spawn an instance of program B, it invokes a
> > > > capability to the B-constructor.  The B-constructor then builds B.
> > > > When all this has happened, A and B are both children of their own
> > > > constructor, which are children of the meta-constructor.  In
> > > > particular, B is not a child of A.  So B has successfully guarded its
> > > > runtime memory image from A (which with these definitions isn't the
> > > > parent, but I think it would be in the situation you were
> > > > envisioning).
> > > 
> > > No, because B was built using storage provided by A, so B is transparent
> > > to A, so B cannot guard its content against A.
> > 
> > No, it wasn't.  And so it can...
> 
> Wait. You said that B was spawned by A. If A did not provide the
> storage, where did the storage come from?

Whoever wanted to keep it hidden has supplied it.  In this case, probably the
creator of the constructor.  As I wrote below, this does indeed prevent
certain code patterns.  This may or may not be a problem.

What I said was that B could indeed guard its storage from A.  I didn't say it
could do so, but let A pay the costs for it.  That is indeed something which
isn't possible without opaque storage.  But that too is possible with the
transparent space bank, if the user's session gives it out.

> > > The problem here isn't the destruction of storage per se. It is the fact
> > > that the destruction of storage used by the child wasn't "all or
> > > nothing".
> 
> File servers are a rare case: programs that must manage storage on
> behalf of multiple clients. This is well known to be exceptionally hard
> to deal with, and file systems must be written with great care.
> Realistically, they should not use client-revocable storage at all.
> 
> But the overwhelming majority of objects are single-client, or if
> multi-client, all clients are in the same storage allocation domain. For
> these, the right thing to do is have the subsystem fail as a complete
> unit instead of having its storage be partially violated.
> 
> Note that "serves one client" does not mean "trusts that client".

So what would be an example of a single-client server, which does not run on
the space bank of the same user as its client?

Thanks,
Bas

-- 
I encourage people to send encrypted e-mail (see http://www.gnupg.org).
If you have problems reading my e-mail, use a better reader.
Please send the central message of e-mails as plain text
   in the message body, not as HTML and definitely not as MS Word.
Please do not use the MS Word format for attachments either.
For more information, see http://129.125.47.90/e-mail.html
signature.asc
Description: Digital signature
[Prev in Thread]
Current Thread
[Next in Thread]
Re: Part 2: System Structure, (continued)
Prev by Date: Re: Part 2: System Structure
Next by Date: Re: Part 2: System Structure
Previous by thread: Re: Part 2: System Structure
Next by thread: Re: Part 2: System Structure
Index(es):
- Date
- Thread