Re: mach_msg blocking on call to vm

On Thu, Sep 1, 2016 at 10:28 AM, Richard Braun <rbraun@sceen.net> wrote:

I completely disagree.

Thank you, Richard. Really! Thank you for disagreeing. Now we can have a good discussion about this!

Most modern microkernels use synchronous IPC
that just block, and many operations on top of them do block. The
overall system on the other hand must be careful to avoid such
deadlocks.

OK, I read the Mach documentation for mach_msg() and concluded that it was like a POSIX read(), that I could operate it in a mode where the kernel absolutely would not block my process, and would return EWOULDBLOCK instead. That's basically a kernel guarantee, at least as much as it is. (Notice that it doesn't guarantee how long the system call will take - 1 ms? 1 s? 1 week? - because it's not a real time system, which is why I say "as much as it is")

Are you now saying that's not how it works on Mach/Hurd? If so, please let me know, because I've been under a big misunderstanding that I need to get cleared up!

Can a bunch of screwy translators legitimately cause mach_msg() to block for some user space thing that might never happen, even if I've supplied MACH_SEND_TIMEOUT?

Shouldn't it just return with no reply message instead?

I don't see anything wrong with vm_map misbehaving if the underlying
implementation is wrong, just fix that implementation, e.g. by
actually taking the send timeout into account here, or making
libpager handle multiple clients like you want.

Yes, but libpager is in user space. Isn't one of the great selling points for Hurd is that we put so much stuff into user space, and the kernel offers us guarantees (read: "guarantees") that we're protected from misbehaving stuff in user space?

Queuing the operation would only add tremendous complexity to an
already extremely complex IPC mechanism where messages are
allowed to be partially sent... Besides, the Unix semantics
often require blocking, since the original system calls are,
well, completely synchronous with respect to user thread
execution (it's actually the very same thread running kernel
code). So you'd only add another layer of synchronization,
and it would block there.

My personal opinion on the matter is that you should only invoke
remote objects if you trust them.

How pervasive is this in the design? Is vm_map only one of many RPCs that can block mach_msg() if some critical system translator is on the blink?

The original Hurd design,
however, was explicitly meant to allow clients and servers to
be "mutually untrusting". I'm not exactly sure what this means
in practice but it seems that clients can detach themselves from
servers at any time. So making the timeout work, and allowing the
transaction to be interrupted (usually with a - hardcoded ? check
how glibc handles Ctrl-C/SIGINT during an RPC - 3 seconds grace
delay before brutal interruption) may be the only things required
to make the behaviour comply with "Hurdish expectations".

Thank you for that clarification. I've figured out that Ctrl-C is handled by a message. Does glibc spawn a separate thread to handle those messages? Is that why all of the processes on the system have at least two threads? That 3 second timeout - what is it, exactly? I'll have to look at the code, but this is something I've only partially puzzled out.

agape

brent

From:	Brent W. Baccala
Subject:	Re: mach_msg blocking on call to vm_map
Date:	Thu, 1 Sep 2016 11:54:20 -1000

Re: mach_msg blocking on call to vm_map