emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: master 8dcb19f 4/4: Add a unit test testing interaction between thre


From: Eli Zaretskii
Subject: Re: master 8dcb19f 4/4: Add a unit test testing interaction between threads and processes.
Date: Sun, 28 Feb 2021 22:19:39 +0200

> From: Philipp <p.stephani2@gmail.com>
> Date: Sun, 28 Feb 2021 19:30:46 +0100
> Cc: Philipp Stephani <phst@google.com>,
>  emacs-devel@gnu.org
> 
> The specific scenario here is: Start a number of process and a number of 
> threads, each of which waits for one of the processes

See below: I don't see how you ensure in this test that each thread
waits for just one process.

> then, cause the processes to exit (since they run „cat“, that works by 
> sending EOF); then, join all the threads.  This should work without hangs or 
> errors.

There are two "inputs" from each subprocess that Emacs receives: the
self-pipe output and the SIGCHLD signal.  Depending on the timing and
on which thread receives which input, the behavior can be different.
Do you have a clear idea what should happen and what does happen wrt
these "inputs"?  In particular, which thread(s) do you expect to see
SIGCHLD and/or read the self-pipe as result of that, and do you think
it's important whether a thread gets awoken by the signal or by
reading the self-pipe?

> >> The test wouldn't work without it, because it calls
> >> accept-process-output from a thread different from the thread in which
> >> the process was created.
> > 
> > But set-process-thread accepts a thread argument, so you could bind
> > the process to a particular thread, couldn't you?
> 
> Yes, but the threads that call `accept-process-output’ and `process-send-eof’ 
> are different threads; binding the process object to one of the threads would 
> cause the other call to fail.

As Stefan points out, process-send-eof could be sent from any thread.

> > More generally, my reading of the code is that the first thread which
> > calls accept-process-output will start listening on all the file
> > descriptors of all the sub-processes which were started until then,
> > and all the other threads will have no descriptors left to listen on.
> > Is this the situation you intended to create?
> 
> No, but I’m not convinced that that’s actually how the implementation 
> behaves.  Wouldn’t with the current implementation each thread wait for the 
> file descriptors of one process?

How would that happen, in this particular test?  The test first starts
10 processes, then creates 10 threads, then calls thread-join for each
of the threads.  The threads that the test create don't run, because
the first thing each thread does is attempt to take the global lock,
and becomes blocked because the main thread still runs.  The main
thread yields the first time it calls thread-join; at that time, one
of the 10 waiting threads (not necessarily the one passed as argument
to thread-join) will take the global lock.  This thread will call
accept-process-output, which calls wait_reading_process_output, which
computes the mask for pselect to wait on.  Now look at the code in
compute_non_keyboard_wait_mask and its ilk: the mask will include
every descriptor that is not already watched by some other thread.
Since this is the first thread that calls wait_reading_process_output,
it will find all the descriptors up for grabs, and will add all of
them to the mask.  Then it will release the global lock and call
pselect.  Releasing the global lock will let some other thread of the
9 remaining threads run, that thread will also call
accept-process-output, which will call wait_reading_process_output --
but now all the descriptors are taken by the first thread, so there's
nothing left for the second one, and the same for all the other 8
threads.

So what we have now is one thread waiting for output from all the 10
subprocess, and the other 9 waiting only on the self-pipe.  What
happens then, when the processes exit and we get 10 SIGCHLD signals,
is left as an exercise.  And don't forget that order in which we
receive the signal and the byte from self-pipe is not entirely
deterministic, either.

Am I missing something, or is this what happens the way the test is
written?  And if I'm not missing anything, is this the situation you
wanted to create and test?

> > Why is it important to test this situation, and what do we expect to
> > happen in it?  Is it reasonable to have more than one thread wait for
> > output from a subprocess?
> 
> The tests shouldn’t ask what’s reasonable, but what’s documented or expected. 
>  Surely users can write code like this, and I don’t see why that shouldn’t be 
> supported.

I didn't say anything about unsupporting this situation, I asked what
should happen in it, and I meant the details, not merely that there
are no errors.  E.g., is it important that each threads gets output
from exactly one subprocess? is it important whether a thread exits
because it read output or because it received SIGCHLD and read the
self-pipe? is it important whether all the threads exit one after
another, or is it okay if they all wait until one thread exits and
then exit immediately (because all the processes have already died)?
etc.

I instrumented the code that gets executed by this test and saw all
kinds of strange phenomena, and I'm asking myself whether you looked
at those details and whether you are satisfied by what you saw.

> > The question I ask myself is what to do with the deviant behavior on
> > MS-Windows.  If it behaves differently because what we test here are
> > the peculiarities of the semi-broken Unix mechanism known as "signals"
> > and/or some details of our implementation of subprocesses on Posix
> > platforms, then maybe there's no need to change anything in the
> > Windows code, and just either skip this test on MS-Windows or write a
> > different test which will work there.  IOW, in such a case there's no
> > bug in the Windows port, it's just that it isn't bug-for-bug
> > compatible with Unix.
> 
> I don’t think so.  Again, I wrote the test based on my understanding of the 
> expected and promised behavior of the process and thread functions, not the 
> Posix implementation.

Where and how did you obtain that understanding?  The documentation is
(intentionally or unintentionally) very vague regarding these details.
Perhaps you only cared about whether the code should or shouldn't err,
but that's clearly not the only interesting or important aspect of
this exercise, is it?

> > But if what happens in this test is important in practical use cases,
> > then perhaps our Windows code does need some changes in this area.
> > 
> > This is why I'm trying to understand the semantics of this test -- I'd
> > like to figure out which implementation details are important, and
> > thus should be emulated on Windows, and which are just details that
> > can be different (and portable Lisp programs will have to take those
> > differences into account).
> 
> The way I see it, all aspects of the test are important — their behavior 
> should be guaranteed by the implementation.

Which aspects of those I mentioned above do you think the
implementation should guarantee?  E.g., should it guarantee how many
threads get the actual output from the processes in this case, and how
many get the signal?

> > For example, the test verifies that each process exited with zero
> > status, but it doesn't care which thread detected that -- is this
> > important?
> 
> IIUC threads don’t really „detect“ process exit.

??? When a subprocess exits, Emacs gets a SIGCHLD, and then we access
the exit code in order to pass it to the sentinel function.  This is
what I call "detecting process exit".

> What I’d assume is that the `thread-join’ calls are synchronization points — 
> after calling `thread-join’ the thread in question is guaranteed to have 
> finished, and since the thread has been waiting for a process to finish 
> (using the accept-process-output loop), the process is also guaranteed to 
> have finished, and this sequence of event is guaranteed exactly in that order.

I don't see how you reach those conclusions.  The first part is true:
when thread-join exits, the thread it was waiting for is guaranteed to
have exited.  But how it exited, and what caused it to exit -- these
are open questions, in a situation where 10 subprocesses generate
output and produce 10 signals.

> >  Or what about delivery of SIGCHLD -- is this important
> > which thread gets delivered the signal, or how many threads should the
> > signal wake up?
> 
> Not as long as the observable behavior stays the same (and isn’t buggy).

And the "observable behavior" is what? that processes exited
successfully and no Lisp errors happened?  Or is there something else
we expect in this case?

> >  Or is it important when and due to what reason(s)
> > does each thread exit?
> 
> It’s important that each thread exits normally, i.e., due to its thread 
> function body exiting normally.

So it is NOT important what event caused the thread function to run,
and which event caused it to exit accept-process-output and finish?

> the observable order of events matters: first process-send-eof, then process 
> exit, then thread exit, then thread-join returns.

The order of the first two is guaranteed because a single thread
arranges for that sequentially.  But do you care if the sequence is
like this:

  . all the 10 processes exit
  . one of the threads exit because the processes exited
  . the other 9 threads get awoken one by one, exit
    accept-process-output because of SIGCHLD, end exit



reply via email to

[Prev in Thread] Current Thread [Next in Thread]