emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: master 8dcb19f 4/4: Add a unit test testing interaction between thre


From: Philipp
Subject: Re: master 8dcb19f 4/4: Add a unit test testing interaction between threads and processes.
Date: Thu, 18 Mar 2021 13:51:10 +0100


> Am 28.02.2021 um 21:19 schrieb Eli Zaretskii <eliz@gnu.org>:
> 
>> From: Philipp <p.stephani2@gmail.com>
>> Date: Sun, 28 Feb 2021 19:30:46 +0100
>> Cc: Philipp Stephani <phst@google.com>,
>> emacs-devel@gnu.org
>> 
>> The specific scenario here is: Start a number of process and a number of 
>> threads, each of which waits for one of the processes
> 
> See below: I don't see how you ensure in this test that each thread
> waits for just one process.

Shouldn't providing a process argument to accept-process-output ensure this, 
modulo implementation details that are not observable?  The only nontrivial 
guarantee that the test requires is that after (accept-process-output PROC) 
returns nil, the PROC process has finished, independent of other processes or 
threads, and that multiple concurrent calls of this form don't deadlock.

> 
>> then, cause the processes to exit (since they run „cat“, that works by 
>> sending EOF); then, join all the threads.  This should work without hangs or 
>> errors.
> 
> There are two "inputs" from each subprocess that Emacs receives: the
> self-pipe output and the SIGCHLD signal.  Depending on the timing and
> on which thread receives which input, the behavior can be different.
> Do you have a clear idea what should happen and what does happen wrt
> these "inputs"?  In particular, which thread(s) do you expect to see
> SIGCHLD and/or read the self-pipe as result of that, and do you think
> it's important whether a thread gets awoken by the signal or by
> reading the self-pipe?

All of those are implementation details that should be entirely invisible to 
the thread interface.  As such, none of the threads should "see" any of those.  
All that matters is the observable behavior of accept-process-output: it may 
return nil only if and when its process argument has finished, independent of 
threads or signal disposition.

> 
>>>> The test wouldn't work without it, because it calls
>>>> accept-process-output from a thread different from the thread in which
>>>> the process was created.
>>> 
>>> But set-process-thread accepts a thread argument, so you could bind
>>> the process to a particular thread, couldn't you?
>> 
>> Yes, but the threads that call `accept-process-output’ and 
>> `process-send-eof’ are different threads; binding the process object to one 
>> of the threads would cause the other call to fail.
> 
> As Stefan points out, process-send-eof could be sent from any thread.

Correct, though this shouldn't make a difference: the test should pass whether 
the process objects have thread affinity or not (and that's the case for me).

> 
>>> More generally, my reading of the code is that the first thread which
>>> calls accept-process-output will start listening on all the file
>>> descriptors of all the sub-processes which were started until then,
>>> and all the other threads will have no descriptors left to listen on.
>>> Is this the situation you intended to create?
>> 
>> No, but I’m not convinced that that’s actually how the implementation 
>> behaves.  Wouldn’t with the current implementation each thread wait for the 
>> file descriptors of one process?
> 
> How would that happen, in this particular test?  The test first starts
> 10 processes, then creates 10 threads, then calls thread-join for each
> of the threads.  The threads that the test create don't run, because
> the first thing each thread does is attempt to take the global lock,
> and becomes blocked because the main thread still runs.  The main
> thread yields the first time it calls thread-join; at that time, one
> of the 10 waiting threads (not necessarily the one passed as argument
> to thread-join) will take the global lock.  This thread will call
> accept-process-output, which calls wait_reading_process_output, which
> computes the mask for pselect to wait on.  Now look at the code in
> compute_non_keyboard_wait_mask and its ilk: the mask will include
> every descriptor that is not already watched by some other thread.
> Since this is the first thread that calls wait_reading_process_output,
> it will find all the descriptors up for grabs, and will add all of
> them to the mask.  Then it will release the global lock and call
> pselect.  Releasing the global lock will let some other thread of the
> 9 remaining threads run, that thread will also call
> accept-process-output, which will call wait_reading_process_output --
> but now all the descriptors are taken by the first thread, so there's
> nothing left for the second one, and the same for all the other 8
> threads.
> 
> So what we have now is one thread waiting for output from all the 10
> subprocess, and the other 9 waiting only on the self-pipe.  What
> happens then, when the processes exit and we get 10 SIGCHLD signals,
> is left as an exercise.  And don't forget that order in which we
> receive the signal and the byte from self-pipe is not entirely
> deterministic, either.
> 
> Am I missing something, or is this what happens the way the test is
> written?  And if I'm not missing anything, is this the situation you
> wanted to create and test?

This might be how the implementation behaves (I haven't checked, but I have no 
reason to doubt your explanation).  However, the test doesn't test the 
implementation, but the observable behavior.  That is, I can't formally prove 
that the implementation leads to the observable behavior that the test expects, 
but I think it should, and I'd consider it a bug (in Emacs, not the test code) 
if it doesn't.

> 
>>> Why is it important to test this situation, and what do we expect to
>>> happen in it?  Is it reasonable to have more than one thread wait for
>>> output from a subprocess?
>> 
>> The tests shouldn’t ask what’s reasonable, but what’s documented or 
>> expected.  Surely users can write code like this, and I don’t see why that 
>> shouldn’t be supported.
> 
> I didn't say anything about unsupporting this situation, I asked what
> should happen in it, and I meant the details, not merely that there
> are no errors.  E.g., is it important that each threads gets output
> from exactly one subprocess?

How does a thread "get output"?  Do you mean that accept-process-output returns 
non-nil?  What I'd expect from accept-process-output with a process argument is 
that directly after it returns nil, the process represented by the process 
argument has finished, all output has been passed to buffers/filters, and the 
sentinel has run.  That's independent of the thread in which 
accept-process-output is called.

> is it important whether a thread exits
> because it read output or because it received SIGCHLD and read the
> self-pipe?

No, these are implementation details: they might as well not exist as far as 
the observable behavior of accept-process-output is concerned.

> is it important whether all the threads exit one after
> another, or is it okay if they all wait until one thread exits and
> then exit immediately (because all the processes have already died)?

I don't think that's observable/documented behavior: we only know that a thread 
has exited after thread-join returns.

> etc.
> 
> I instrumented the code that gets executed by this test and saw all
> kinds of strange phenomena, and I'm asking myself whether you looked
> at those details and whether you are satisfied by what you saw.

No, that's not the way I approach topics like this.  I wrote the test based on 
my reading of the documentation (having in mind the complexity of the 
implementation, but not being guided by it).  I then ran the test and it 
passed, which gave me a bit of confidence in the correctness of the 
implementation (which the test can't prove, of course).
So the questions I'd ask here: Is the test code correct, based on documented 
(or at least implied) guarantees?  If it's correct, but the test fails on some 
system, the implementation is buggy.  What specific sequence of events in the 
implementation code leads to the buggy behavior is then only the next step, not 
the starting point of the discussion.

> 
>>> The question I ask myself is what to do with the deviant behavior on
>>> MS-Windows.  If it behaves differently because what we test here are
>>> the peculiarities of the semi-broken Unix mechanism known as "signals"
>>> and/or some details of our implementation of subprocesses on Posix
>>> platforms, then maybe there's no need to change anything in the
>>> Windows code, and just either skip this test on MS-Windows or write a
>>> different test which will work there.  IOW, in such a case there's no
>>> bug in the Windows port, it's just that it isn't bug-for-bug
>>> compatible with Unix.
>> 
>> I don’t think so.  Again, I wrote the test based on my understanding of the 
>> expected and promised behavior of the process and thread functions, not the 
>> Posix implementation.
> 
> Where and how did you obtain that understanding?  The documentation is
> (intentionally or unintentionally) very vague regarding these details.

Yes, so I had to fill in some aspects that I hope are reasonable.  In 
particular, the documentation doesn't state explicitly that waiting for a 
process in a different thread than the original one is supported, but given 
set-process-thread it implicitly has to be, because what would otherwise be the 
purpose of set-process-thread?
The documentation definitely needs to be significantly expanded, as this 
discussion shows.
So maybe asking more directly:
- Should this test pass?
- If so, how can this be deduced from the documentation?
- If not, how can *that* be deduced from the documentation, and what changes 
would be necessary to guarantee that it passes?
- Is there anything missing from the documentation that prevents deducing 
whether the test should pass or not?

> Perhaps you only cared about whether the code should or shouldn't err,
> but that's clearly not the only interesting or important aspect of
> this exercise, is it?

It's at least one important aspect, but finding deadlocks (hangs) is probably 
more important.

> 
>>> But if what happens in this test is important in practical use cases,
>>> then perhaps our Windows code does need some changes in this area.
>>> 
>>> This is why I'm trying to understand the semantics of this test -- I'd
>>> like to figure out which implementation details are important, and
>>> thus should be emulated on Windows, and which are just details that
>>> can be different (and portable Lisp programs will have to take those
>>> differences into account).
>> 
>> The way I see it, all aspects of the test are important — their behavior 
>> should be guaranteed by the implementation.
> 
> Which aspects of those I mentioned above do you think the
> implementation should guarantee?  E.g., should it guarantee how many
> threads get the actual output from the processes in this case,

That depends what "get the actual output" means.  What I expect is the behavior 
of accept-process-output, as explained above. Specifically: 
accept-process-output will return nil quickly (within milliseconds, assuming 
the calling thread is unblocked) once the given process has exited.  Once it 
returns, all process output has been given to process filters/buffers, and the 
sentinel has run.

> and how
> many get the signal?

None, as the signal is an implementation detail and not part of the interface.

> 
>>> For example, the test verifies that each process exited with zero
>>> status, but it doesn't care which thread detected that -- is this
>>> important?
>> 
>> IIUC threads don’t really „detect“ process exit.
> 
> ??? When a subprocess exits, Emacs gets a SIGCHLD, and then we access
> the exit code in order to pass it to the sentinel function.  This is
> what I call "detecting process exit".

Sure, but is that really related to threads?  I'd rather say that *Emacs* 
detects process exits.
It's also true that there's some thread that contains the `accept' call, but 
that's again an implementation detail and not related to Lisp threads.  Several 
other implementations would be possible: Calling `accept' in a single thread 
not accessible from Lisp at all; implementing Lisp threads as "green threads" 
without pthread support; etc.

> 
>> What I’d assume is that the `thread-join’ calls are synchronization points — 
>> after calling `thread-join’ the thread in question is guaranteed to have 
>> finished, and since the thread has been waiting for a process to finish 
>> (using the accept-process-output loop), the process is also guaranteed to 
>> have finished, and this sequence of event is guaranteed exactly in that 
>> order.
> 
> I don't see how you reach those conclusions.  The first part is true:
> when thread-join exits, the thread it was waiting for is guaranteed to
> have exited.  But how it exited, and what caused it to exit -- these
> are open questions, in a situation where 10 subprocesses generate
> output and produce 10 signals.

There's only one way for a thread to exit: when its thread function returns.  
At least that's what the manual says.  And the thread function can only return 
when `accept-process-output' returns nil, so we're back at the discussion what 
the semantics of `accept-process-output' are.

> 
>>> Or what about delivery of SIGCHLD -- is this important
>>> which thread gets delivered the signal, or how many threads should the
>>> signal wake up?
>> 
>> Not as long as the observable behavior stays the same (and isn’t buggy).
> 
> And the "observable behavior" is what? that processes exited
> successfully and no Lisp errors happened?  Or is there something else
> we expect in this case?

That the code as a whole doesn't hang/deadlock.  It's harder to make precise 
guarantees for that, but the 60s timeout should be long enough for the test to 
finish even on a heavily-loaded system.

> 
>>> Or is it important when and due to what reason(s)
>>> does each thread exit?
>> 
>> It’s important that each thread exits normally, i.e., due to its thread 
>> function body exiting normally.
> 
> So it is NOT important what event caused the thread function to run,
> and which event caused it to exit accept-process-output and finish?

No.  AFAIK these are unobservable, and the manual doesn't make any guarantees 
in this respect.

> 
>> the observable order of events matters: first process-send-eof, then process 
>> exit, then thread exit, then thread-join returns.
> 
> The order of the first two is guaranteed because a single thread
> arranges for that sequentially.  But do you care if the sequence is
> like this:
> 
>  . all the 10 processes exit
>  . one of the threads exit because the processes exited
>  . the other 9 threads get awoken one by one, exit
>    accept-process-output because of SIGCHLD, end exit


From what I can see, that should be fine.  The test only checks that the 
threads finish eventually and don't deadlock, but doesn't care about the order 
of events across threads (which is not guaranteed unless there are explicit 
synchronization points like mutexes or condition variables).




reply via email to

[Prev in Thread] Current Thread [Next in Thread]