l4-hurd
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

deferred cancellation of ipc


From: Marcus Brinkmann
Subject: deferred cancellation of ipc
Date: Tue, 14 Oct 2003 01:59:05 +0200
User-agent: Mutt/1.5.4i

Hi,

I have banged my head against the following problem, to which I can not find
a satisfying solution in L4.

The context is, very broadly, cancellation of threads, but unlike
pthread_cancel, it should be done in a controlled manner.  In other words,
after having cancelled a thread, the cancelled thread should be able to work
out in what state it is in.

On the surface, L4 provides the right primitives.  Ipc() is the only system
call that really can block for a long time.  And with ExchangeRegisters(),
it is cancellable.  The thread which was in Ipc() can determine if it was
cancelled or even aborted from the ErrorCode.

However, this requires that the cancelled thread already is in an Ipc(). 
What if it isn't yet in an Ipc(), but we want to make sure that the thread
will be cancelled the next time it does call Ipc()?  This is what in pthread
is a deferred cancellation, and it is actually very, very useful. 
In high-level applications, cancelling a thread asynchronously is barely
useful (well, except for signal handlers), while deferred cancellation is
essential to prevent unwanted blocking.

So, second attempt.  The cancelling thread stops the canceled thread and
cancels/aborts pending Ipc operations.  If the thread was not in an Ipc(),
it sets a cancel flag and resumes the thread.  Then the canceled thread
checks the cancel flag the next time it might end up blocking:

our_ipc (...)
{
  if (ipc_cancellable && testcancel ())
    /* Do something.  */
  else
    return L4_Ipc (...);
}

What can the thread do if cancel is requested?  The most useful behaviour
would be to fake a canceled IPC!  So, it could set the error flag in the msg
tag and set ErrorCode to indicate a canceled ipc.  The idea is that there is
virtually no difference between a just started Ipc() that was canceled and
being canceled long before the Ipc().  The ipc_cancellable flag is used to
determine if the user wants Ipc() to be cancellable at all (you might know
for sure that the ipc partner is ready and thus ignore the cancel flag this
time).

Here is my first problem: ErrorCode is specified in the specs to be
read-only, so technically I can not fake an error code to indicate a
canceled Ipc().  But in reality, the implementation on all specified arches
would happily allow it.  I see the reason behind the specification, but do
you expect that any arch will ever enfore ErrorCode to be read-only? 
Anyway, this is only a side issue.

The real problem is that the above of course doesn't work.  There is a race
condition.  If the thread is "canceled" right after testcancel() and before
the L4_Ipc(), the cancel flag is ignored for this Ipc, which might be a long
blocking one.  This race window can possibly be quite small, but it is
definitely there, and sticks out like a sore thumb.

I have considered various approaches, with increasing complexity, and I just
can't think of a reasonably elegant way to get rid of it.  For example, one
could set a flag in the canceled thread that it is about to do an Ipc, and
the cancelling thread can then yield to the canceled thread until it is
forced into the L4_Ipc() operation or past it (flag is cleared again).
Another attempt that I have not worked out in detail because it is just too
horrible would be to investigate the opcodes based on the IP and then figure
out if you are before or after the Ipc(), and then take appropriate actions
by jumping into cleanup handlers.

I hope I was able to make clear what the problem is.  Basically, I am
looking for a way to interrupt or prevent (or rather a combination of both)
reliably and efficiently the next (blocking, potentially blocking) Ipc()
operation a thread wants to make (or is making).

In other systems, there doesn't seem to be such an option.  For example if
you cancel a write() operation in Linux, I am not sure if you can get
status information about how much of the write succeeded reliably and
easily.  However, I am still searching for a documentation on the syscall
cancellation mechanism in Linux.  In L4, I think it is a fundamental
robustness issue to make this possible (how else could you reliably
interrupt in a deferred way a thread going to perform a closed receive?).

It might be that I am just still to unfamiliar with how high-level
implementations of signals, cancellation, etc are working together. 
However, my feeling is that the above operation is one of the fundamental
building stone of more complex systems, so it should be useful to consider
it in isolation.

Ideas welcome, and thanks a lot for your consideration,
Marcus

-- 
`Rhubarb is no Egyptian god.' GNU      http://www.gnu.org    address@hidden
Marcus Brinkmann              The Hurd http://www.gnu.org/software/hurd/
address@hidden
http://www.marcus-brinkmann.de/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]