[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[RFC PATCH 0/1] gsync blocking the calling thread considered harmful
From: |
Sergey Bugaev |
Subject: |
[RFC PATCH 0/1] gsync blocking the calling thread considered harmful |
Date: |
Thu, 10 Jun 2021 19:46:33 +0300 |
Hello,
while hacking on rpctrace -p, I once again ran into gsync_wait () calls
permanently hanging rpctrace.
The reason for this is simple: once rpctrace logs the gsync_wait () call it
receives from a traced task, it forwards the same gsync_wait () call to the
actual task port of the traced task, and this causes rpctrace itself to block
since the gsync_wait () implementation always affects the calling thread.
Normally, some time later the blocked thread would be woken up by a gsync_wake
() call done by another thread; but since rpctrace itself is single-threaded,
other threads in the traced tasks can enqueue gsync_wake_request ()s in vain,
as they'll never even get received by the hanging rpctrace, nor forwarded to
the kernel.
One way to work around this would be to make rpctrace multithreaded, and I've
heard there's been some work in that direction.
But to me it sounds like gsync_wait () blocking rpctrace is the part that goes
wrong. Generally, rpctrace is never supposed to block on an RPC made by a
traced task: it forwards the request message without blocking for the reply
(that may come later, or never).
So I've been long thinking that gsync_wait () should do the same: instead of
actually blocking the thread calling gsync_wait (), gsync_wait_request ()
should return immediately, and the reply message will come once someone calls
gsync_wake (). This doesn't change anything for the regular callers of
gsync_wait (), since the call will still appear to block the same way other
RPCs do, but it will actually block on msg receive, not inside gsync_wait ()
itself.
This must have been discussed before, and there must be a reason why gsync_wait
() was made to behave the way that it does and not in the (arguably simpler and
more consistent) way I'm proposing; but I can't find any relevant discussion.
Anyway, I thought I'd try implementing my idea and seeing what would break.
Much like with glibc, I'm not very familiar with Mach-the-kernel internals, but
to my surprise the first version that compiled appears to work just fine,
booting a full working Hurd system (and rpctracing gsync_wait () totally
works). Still, I probably messed something up: some locking or reference
counting or somesuch; so please review :)
The part that I didn't figure out yet is how the kernel can listen for a right
to become a dead name (like the dead-name notification in userspace). Perhaps
it amounts to calling ipc_port_dnrequest () and listening for messages just
like in userspace, but I have not figured out the details yet. Without this,
the kernel cannot really know when the reply port is deallocated, either
explicitly by userspace or because the task died, so it means the kernel will
leak memory allocated for the waiters. Another to-do item is timeout support;
ideally it should just turn into waittime in the MIG definition, but this again
requires handling the reply port dying, and would change the message format,
and also GNU MIG doesn't yet support conditional timeouts (although I might
have a MIG patch or two pending for this).
So, what do you think?
Sergey
P.S. Feels so good to hack on something that I can just post about in public
again!
- [RFC PATCH 0/1] gsync blocking the calling thread considered harmful,
Sergey Bugaev <=