l4-hurd
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

task death notifications


From: Marcus Brinkmann
Subject: task death notifications
Date: Sun, 4 May 2003 17:56:03 +0200
User-agent: Mutt/1.5.3i

Hi,

I was thinking about task death notifications, and noticed how hard it is to
get them right on L4.  I don't remember a thorough discussion of them, I
think Neal and me just took it for granted that it was straightforward.

But it isn't.

First, why are they needed?  If a client sends an RPC to a server, there are
two possible cases: The RPC can be processed quickly, then the server just
does and tries to send the reply.  If that succeeds, fine, otherwise it will
just drop the request.  The other case is that the RPC blocks for a long
time.  This is the case for, for example, io_select.  Now, an io_select call
can last forever, for example selecting on a pipe that is never written to.
The pipe can last forever, too, for example if it is a named pipe.

Now, what happens if the client dies before the io_Select call completed? 
Then the server needs to notice, and free the resources associated with the
select call.  Associating the resources with the pipe is not appropriate,
because the owner of the pipe might be different from the owner of the
client.

So, the server needs to request notifications from the task server for the
client's thread id.  And if that thread is destroyed, the server can receive
the notification and cancel the pending request.  This only needs to happen
for blocking RPCs.  Other RPCs are by definition too short to worry about
this.

Now, the straightforward implementation: Receive RPC, read sender thread id,
send death notification request to task server for this thread id, process
RPC, has a race condition:  After receiving the RPC, and before requesting
the notification, the thread may die.  Now, two things can happen:

1. The task server will report that the thread does not exist.  In that
case, the server can guess what happened and terminate the RPC.

2. If the race window is large enough, the thread id might be reused before
the server gets to ask about it.  This requires two things: First, the
global part of the thread id needs to be reused (this is likely if threads
are cached in a pool, which they probably should), and the version part
needs to be reused.  I proposed the version part to be the PID.  This can
happen for example if 30.000 tasks are quickly created and die.  The
attacker could help make this happen, in very unlikely conditions.

Of course, scenario 2 is extremely unlikely.  Still, it is something to
worry about if we worry about robustness and subtle race conditions at all.

In Mach this problem does not occur because of the globally managed ports,
which don't allow for such an errorneous reuse of global ids.

I guess that there are two ways to fix this:

1. One way would be an interaction with the task server, that requires the
client to copy a send right to a task identifying port to the server. 
Because the server receives a (not privileged) handle for the task, the use
of global ids for identification is reduced, and no misjudgement can
happen.  The server can then use the "task id port" handle to request death
notifications.  I have not really thought this scheme through, but it should
work out, however, it is quite a complex setup and imposes some overhead on
the communication between client and server.

2. The other way is to make global thread ids more robust.  This means the
task server has to make some sort of guarantee about not reusing global
thread ids (with their version part taken into account, at least) for some
time.  The server than has to ensure that between receiving the message and
requesting the death notification, this time has not elapsed.  Or, the other
way round, the server tells the task server the minimum life time the thread
must have for which it wants death notifications.  If the thread is younger
(which means it was reused), the task server declines and lets the server
know.  This is easy to implement, because the task server jus records the task
creation time, and compares that with the server requested maximum creation
time.  Because message reception is synchronous in L4, the time we receive a
message is a time where the client thread still lives.  However, there is a
catch:  There must not be a race between receiving the message and determining
the time the message was received - the reception time is not recorded in
the message we receive (L4 doesn't timestamp messages).  I am unsure about
any scheduling guarantees we can exploit, and other things.  Either we would
need an L4 extension that adds timestamping to (some) messages, or, we could
use another, earlier time, for comparison, the time we start to receive
messages.  That time can be much much earlier than the actual reception, and
the client might have been started after the server sdtarts to listen.  In
that case, the client would have to retry contacting the server once.

If this last way is the way we need to do it, there is the potential retry
cost of one message per server the client uses, not much, and it's worst
case that is hardly reached (only very few RPCs are blocking and require
this amount of paranoia).

I am happy to lay out an algorithm I believe to work in more detail, but the
above should be enough to be clear about the problem and potential
solutions, and I am happy to hear about other ideas, or problems with the
above.

Thanks,
Marcus


-- 
`Rhubarb is no Egyptian god.' GNU      http://www.gnu.org    address@hidden
Marcus Brinkmann              The Hurd http://www.gnu.org/software/hurd/
address@hidden
http://www.marcus-brinkmann.de/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]